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GENE CLUSTER FOR PRODUCTION OF THE ENEDIYNE 
ANTITUMOR ANTIBIOTIC C-1027 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims benefit under 35 U.S.C. §1 19 of provisional 
5 application USSN 60/1 15,434, filed on January 6, 1999, which is herein incorporated by 
reference in its entirety for all purposes. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 

This work was supported in part by a grant from the Cancer Research 
10 Coordinating Committee, University of California, the National Institutes of Health grant 
CA78747, and the Searle Scholars Program/The Chicago Community Trust. The 
Government of the United States of America may have certain rights in this invention. 

FIELD OF THE INVENTION 

This invention relates to the field of enediyne antibiotics. In particular this 
15 invention elucidates the gene cluster controlling the biosynthesis of the C-1027 enediyne. 

BACKGROUND OF THE INVENTION 

The enediyne antibiotics are currently the focus of intense research activity in 
the fields of chemistry, biology, and medical sciences, because of their unique molecular 
architecture, biological activities, and modes of actions (Doyle and Borders (1995) Enediyne 

20 antibiotics as antitumor agents. Marcel-Dekker, New York, Thorson et al (1999) Bioorg. 
Chem., 27: 172-188). Since the unveiling of the structure of neocarzinostatin chromophore 
(Edo et al (1985) Tetrahedron Lett. 26: 331-340) in 1985, the enediyne family has grown 
steadily. Thus far, there have been three basic groups within the enediyne antibiotic family: 
(a) the calicheamicin/esperamicin type, which includes the calicheamicins, the esperamicins, 

25 and namenamicin, (b) the dynemicin type, and (c) the chromoprotein type, consisting of an 
apoprotein and an unstable enediyne chromophore. The latter group includes 
neocarzinostatin, kedarcidin, C-1027 (Fig. 1), and maduropeptin, whose enediyne 
chromophore structures have been established, as well as several others whose enediyne 
chromophore structures are yet to be determined due to their instability (Thorson et al 
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(1999) Bioorg. Chem., 27: 172-188). N1999A2, in contrast to the other chromoproteins, 

exists as an enediyne chromophore alone despite the fact that its structure is very similar to 

the other chromoprotein chromophore (Ando et a/.(1998) Tetra. Letts., 39: 6495-6480). 

As a family, the enediyne antibiotics are the most potent, highly active 

antitumor agents ever discovered. Some members are 1000 times more potent than 

adriamycin, one of the most effective, clinically used antitumor antibiotics (Zhen et al 

(1989) J. Antibiot. 42: 1294-1298). All members of this family contain a unit consisting of 

two acetylenic groups conjugated to a double bond or incipient double bond within a nine or 

ten-membered ring; i.e., the enediyne core as exemplified by C-1027 in Fig. 1. As the 

consequence of this structural feature, these compounds share a common mechanism of 

action: the enediyne core undergoes an electronic rearrangement to form a transient 

benzenoid diradical, which is positioned in the minor groove of DNA so as to damage DNA 

by abstracting hydrogen atoms from deoxyriboses on both strands (Fig. 1). Reaction of the 

resulting deoxyribose carbon-centered radicals with molecular oxygen initiates a process that 

results in both single-strand and double-strand DNA cleavages (Doyle and Borders (1995) 

Enediyne antibiotics as antitumor agents. Marcel-Dekker, New York; Ikemoton et al (1995) 

Proc.Natl Acad. Sci. USA 92:10506-10510; Myers et al. {1991) J. Am. Chem. Soc. 119: 

2965-2972; Stassinopoulos et al (1996) Science 272: 1943-1946; Thorson et al (1999) 

Bioorg. Chem., 27: 172-188; Xu et al (1997) J. Am. Chem. Soc. 1 19: 1 133-1 134). This 

novel mechanism of DNA damage has important implications for their application as potent 

cancer chemotherapeutic agents (Doyle and Borders (1995) supra.; Sievers et al (1999) 

Blood 93: 3678-3684). 

As an alternative to making structural analogs of microbial metabolites by 

chemical synthesis, manipulations of genes governing secondary metabolism offer a 

promising alternative allowing preparation of these compounds biosynthetically (Cane et al 

(1998) Science 282: 63-68; Hutchinson and Fujii. (1995) Ann. Rev. Microbiol 49: 201-38; 

Katz and Donadio (1993) Ann. Rev. Microbiol 47: 875-912). The success of the latter 

approach depends critically on the availability of novel genetic systems and on genes 

encoding novel enzyme activities. The enediynes offer a distinct opportunity to study the 

biosynthesis of their unique molecular scaffolds and the mechanism of self-resistance to 

extremely cytotoxic natural products. Elucidation of these aspects provides access to 

rational engineering of enediyne biosynthesis for novel drug leads and makes it possible to 

construct enediyne overproducing strains by de-regulating the biosynthetic machinery. In 
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addition, elucidation of an enediyne gene cluster contributes to the general field of 
combinatorial biosynthesis by expanding the repertoire of novel polyketide synthase (PKS) 
and deoxysugar biosynthesis genes as well as other genes uniquely associated with enediyne 
biosynthesis, leading to the making of novel enediynes via combinatorial biosynthesis. 

5 SUMMARY OF THE INVENTION 

This invention provides nucleic acid sequences and characterization of the 
gene cluster responsible for the biosynthesis of the enediyne C-1027 (produced by 
Streptomyces globisporus). In particular structural and functional characterization is 
provided for the 50 open reading frames (ORFs) comprising this gene cluster. Thus, in one 

10 embodiment, this invention provides an isolated nucleic acid comprising a nucleic acid 

selected from the group consisting of a nucleic acid encoding any of C-1027 open reading 
frames (ORFs) -7 through 42, excluding ORF 9 (cagA), a nucleic acid encoding a 
polypeptide encoded by any of C-1027 open reading frames (ORFs) -7 through 42, excluding 
ORF 9 (cagA); and a nucleic acid amplified by polymerase chain reaction (PCR) using 

15 primer pairs that amplify any of C-1027 open reading frames (ORFs) -7 through 42, 

excluding ORF 9 (cagA). In one embodiment, preferred nucleic acids comprise a nucleic 
acid encoding at least two (more preferably at least three or more) open reading frames 
(ORFs) selected from the group consisting of ORF-1 through ORF 42, excluding ORF 9 
(cagA). 

20 In another embodiment this invention provides an isolated nucleic acid 

comprising a nucleic acid that specifically hybridizes under stringent conditions to an open 
reading frame (ORF) of the C-1027 biosynthesis gene cluster, excluding ORF 9 (cagA), and 
can substitute for the ORF to which it specifically hybridizes to direct the synthesis of an 
enediyne. In certain embodiments this also includes nucleic acids that would stringently 

25 hybridizes indicated above, but for, the degeneracy of the nucleic acid code. In other words, 
if silent mutations could be made in the subject sequence so that it hybridizes to he indicated 
sequence(s) under stringent conditions, it would be included in certain embodiments. 
Particularly preferred nucleic acids comprises a nucleic acid that specifically hybridizes 
under stringent conditions to a nucleic acid selected from the group consisting of ORF -7, 

30 ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, 
ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 1 1, ORF 12, ORF 13, ORF 14, ORF 15, ORF 
16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 




26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 
36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. Particularly preferred 
isolated nucleic acid comprises a nucleic acid selected from the group consisting of ORF -7, 
ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, 
5 ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 15, ORF 
16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 25, ORF 
26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 35, ORF 
36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. The nucleic acid may 
comprises a nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid 

10 selected from the group consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, 
ORF -1, ORF 0, ORF 1, ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1, 
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 8, ORF 10, ORF 11, ORF 12, ORF 13, 
ORF 14, ORF 15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, 
ORF 24, ORF 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, 

15 ORF 34, ORF 35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42. 

This invention also provides an isolated gene cluster comprising open reading 
frames encoding polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C- 
1027 enediyne analogue. The gene cluster may be present in a cell, more preferably in a 
bacterial cell (e.g. Actinomycetes, Actinoplanetes, Actinomadura, Micromonospora, or 

20 Streptomycetes). Particular preferred bacterial cells include, but are not limited to 

Streptomyces globisporus, Streptomyces lividans, Streptomyces coelicolor, Micromonospora 
echinospora spp. calichenisis, Actinomadura verrucosopora, Micromonospora chersina, 
Streptomyces carzinostaticus, and Actinomycete L585-6. The gene cluster may contain one 
or more open reading frames is operatively linked to a heterologous promoter (e.g. a 

25 constitutive or an inducible promoter). 

This invention also provides for an polypeptide encoded by any one or more 
of the nucleic acids described herein. 

Also provided are host cell(s) (e.g. eukaryotic cells or bacterial cells as 
described herein) transformed with one or more of the expression vectors described herein. 

30 Preferred host cells are transformed with an exogenous nucleic acid comprising a gene 

cluster encoding polypeptides sufficient to direct the assembly of a C-1027 enediyne or a C- 
1027 enediyne analogue. In certain embodiments, heterologous nucleic acid may comprise 
only a portion of the gene cluster, but the cell will still be able to express an enediyne. 



This invention also provides methods of chemically modifying a biological 

molecule. The methods involve contacting a biological molecule that is a substrate for a 

polypeptide encoded by a C-1027 biosynthesis gene cluster open reading frame, with a 

polypeptide encoded by a C-1027 biosynthesis gene cluster open reading frame whereby the 

polypeptide chemically modifies the biological molecule. In one preferred embodiment, the 

polypeptide is an enzyme selected from the group consisting of a hydroxylase, a 

homocysteine synthase, a dNDP-glucose dehydrogenase, a citrate carrier protein, a C-methyl 

transferase, an N-methyl transferase, an aminotransferase, a CagA apoprotein, an NDP- 

glucose synthase, an epimerase, an acyl transferase, a coenzyme F390 synthase, and 

epoxidase hydrolase, an anthranilate synthase, a glycosyl transferase, a monooxygenase, a 

type II condensation protein, an aminomutase, a type II adenylation protein, an O-methyl 

transferase, a P-450 hydroxylase, an oxidoreductase, and a proline oxidase. In a preferred 

embodiment the method involves contacting the biological molecule with at least two 

(preferably at least three or more) different polypeptides encoded by C-1027 biosynthesis 

gene cluster open reading frames. The contacting may be in a host cell (e.g. a eukaryotic cell 

or a bacterial cell) or the contacting can be ex vivo. The biological molecule can be an 

endogenous metabolite produced by said host cell or an exogenous supplied metabolite. In 

preferred embodiments, the host cell is a bacterial cell or eukaryotic cell (e.g., a mammalian 

cell, a yeast cell, a plant cell, a fungal cell, an insect cell, etc.). In certain preferred 

embodiments, the host cell synthesizes sugars and glycosylates the biological molecule. In 

other preferred embodiments, the host cell synthesizes deoxysugars. The method can further 

involve contacting the biological molecule with a polyketide synthase or a non-ribosomal 

polypeptide synthetase. The contacting can be in a cell (e.g., a bacterial cell) or ex vivo. In 

one preferred embodiment the method comprises contacting the biological molecule with at 

substantially all of the polypeptides encoded by C-1027 biosynthesis gene cluster open 

reading frames and said method produces an enediyne or enediyne analogue. In another 

preferred embodiment, the biological molecule is a fatty acid and the biological molecule is 

contacted with a C-1027 orf polyeptide selected from the group consisting of an epoxide 

hydrase, a monooxygenase, an iron-sulfer flavoprotein, a p-450 hydroxylase, an 

oxidoreductase, and a proline oxidase. In certain embodiments, the biological molecule is a 

fatty acid and said biological molecule is contacted with a plurality of C-1027 orf 

polypeptides comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, 

a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In one especially preferred 
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embodiment ,the biological molecule is contacted with polypeptides encoded by ORF 17, 
ORF20, ORF21, ORF29, ORF30, ORF32, ORF35, and ORF38. In another especially 
preferred embodiment, the biological molecule is contacted with polypeptides encoded by 
ORF 15, ORF 16, ORF 28, ORF3, ORF 14, and ORF 13, and, in certain embodiments, ORF 
5 4 and ORF 3 as well. 

In certain embodiments, the method may comprise contacting a sugar with 
one or more C-1027 open reading frame polypeptides selected from the group consisting of a 
dNDP-glucose synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a 
C-methyltransferase, an N-methyltransferase, and a glycosyl transferase. Particularly 

1 0 preferred variant of this method comprise contacting a dNDP-glucose with a plurality of C- 
1027 open reading frame polypeptides comprising a dNDP-glucose synthase, a dNDP 
glucose dehydratase, an epimerase, an aminotransferase, a C-methyltransferase, an N- 
methyltransferase, and a glycosyl transferase. 

In certain other embodiments, the method comprises contacting an amino acid 

15 with one or one or more C-1027 open reading frame polypeptides selected from the group 

consisting of a hydroxylase, an aminomutase, a type II NRPS condensation enzyme, a type II 
NRPS adenylation enzyme, and a type II peptidyl carrier protein. These methods may 
involve contacting an amino acid with a plurality of C-1027 open reading frame polypeptides 
comprising a hydroxylase, a halogenase, an aminomutase, a type II NRPS condensation 

20 enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier protein. In 
particularly preferred embodiments, the amino acid is a tyrosine. 

This invention also provides a method of synthesizing a chromaprotein type 
enediyne core, said method comprising contacting a fatty acid with one or more C-1027 orf 
polypeptides selected from the group consisting of an epoxide hydrase, a monooxygenase, an 

25 iron-sulfer flavoprotein, a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In 
preferred embodiments, the fatty acid may be contacted with a plurality of C-1027 orf 
polypeptides comprising an epoxide hydrase, a monooxygenase, an iron-sulfer flavoprotein, 
a p-450 hydroxylase, an oxidoreductase, and a proline oxidase. In particularly preferred 
embodiments, the fatty acid is contacted with polypeptides encoded by ORF 17, ORF20, 

30 ORF21, ORF29, ORF30, ORF32, ORF35, and ORF38. 

In still yet another embodiment, this invention provides a method of 
synthesizing a deoxysugar. This method involves contacting a sugar with one or more C- 
1027 open reading frame polypeptides selected from the group consisting of a dNDP-glucose 




synthase, a dNDP glucose dehydratase, an epimerase, an aminotransferase, a C- 
methyltransferase, an N-methyltransferase, and a glycosyl transferase. In preferred 
embodiments, this method involves contacting a dNDP-glucose with a plurality of C- 1027 
open reading frame polypeptides comprising a dNDP-glucose synthase, a dNDP glucose 
5 dehydratase, an epimerase, an aminotransferase, a C-methyltransferase, an N- 

methyltransferase, and a glycosyl transferase. In particularly preferred embodiments, the 
dNDP-glucose is contacted with polypeptides encoded by ORF17, ORF20, ORF21, ORF29, 
ORF30, ORF32, ORF35, and ORF38. 

This invention also provides methods of synthesizing a beta amino acid by 

10 contacting an amino acid with one or one or more C-1027 open reading frame polypeptides 
selected from the group consisting of a hydroxylase, an aminomutase, a type II NRPS 
condensation enzyme, a type II NRPS adenylation enzyme, and a type II peptidyl carrier 
protein. The method preferably comprises contacting an amino acid with a plurality of C- 
1027 open reading frame polypeptides comprising a hydroxylase, a halogenase, an 

15 aminomutase, a type II NRPS condensation enzyme, a type II NRPS adenylation enzyme, 
and a type II peptidyl carrier protein. Particularly preferred embodiments comprise 
contacting the amino acid (e.g. tyrosine) with polypeptides encoded by ORF 4, ORF1 1, 
ORF24, ORF23, ORF25, and ORF26. 

Also provided are methods of synthesizing an enediyne or an enediyne 

20 analogue. These methods involve culturing a cell (e.g. a eukaryotic cell or a bacterium) 
comprising a recombinantly modified C-1027 gene cluster under conditions whereby said 
cell expresses said enediyne or enediyne analogue; and recovering the enediyne or enediyne 
analogue. In preferred embodiments, the gene cluster is present in a bacterium (e.g., 
Actinomycetes, Actinoplanetes, Actinomadura, Micromonospora, or Streptomycetes). 

25 Particularly preferred bacteria include, but are not limited to Streptomyces globisporus, 
Streptomyces lividans, Streptomyces coelicolor, Micromonospora echinospora spp. 
calichenisis, Actinomadura verrucosopora, Micromonospora chersina, Streptomyces 
carzinostaticus f and Actinomycete L585-6. In another preferred embodiment, the gene 
cluster is present in a eukaryotic cell (e.g. a mammalian cell, a yeast cell, a plant cell, a 

30 fungal cell, an insect cell, etc.). The host cell can be one that synthesizes sugars and 
glycosylates the enediyne or enediyne analogue. The host can be one that synthesizes 
deoxy sugars. 
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This invention also provides a method of making a cell (e.g., a bacterial or 
eukaryotic cell) resistant to an enediyne or an enediyne metabolite. This method involves 
expressing in the cell one or more isolated C-1027 open reading frame nucleic acids that 
encode a protein selected from the group consisting of a CagA apoprotein, a SgcB 
5 transmembrane efflux protein, a transmembrane transport protein, a Na+/H+ transporter, an 
ABC transport, a glycerol phosphate tranporter, and a UvrA-like protein. In preferred 
embodiments, the isolated C-1027 open reading frame nucleic acids are selected from the 
group consisting of ORF 9, ORF2, ORF 27, ORF 0, ORF 1 c-terminus, ORF 2, and ORF 1 
N-terminus. Certain embodiments exclude cagA (ORF 9). 
10 In one embodiment, this invention specifically excludes one or more of open 

reading frames -7 through 42. In particular, in one embodiment this invention excludes cagA 
(ORF 9), and/or sgcA (ORF 1), and/or sgcB (ORF 2). 

DEFINITIONS 

The terms "C-1027 open reading frame", and "C-1027 ORF" refer to an open 
15 reading frame in the C-1027 biosynthesis gene cluster as isolated from Streptomyces 
globisporus. The term also embraces the same open reading frames as present in other 
enediyne-synthesizing organisms (e.g. other strains and/or species of Streptomyces, 
Actinomyces, and the like). The term encompasses allelic variants and single nucleotide 
polymorphisms (SNPs). In certain instances the C-1027 ORF is used synonymously with the 
20 polypeptide encoded by the C-1027 ORF and may include conservative substitutions in that 
polypeptide. The particular usage will be clear from context. 

The terms "isolated" "purified" or "biologically pure" refer to material which 
is substantially or essentially free from components which normally accompany it as found 
in its native state. With respect to nucleic acids and/or polypeptides the term can refer to 
25 nucleic acids or polypeptides that are no longer flanked by the sequences typically flanking 
them in nature. 

The terms "polypeptide", "peptide" and "protein" are used interchangeably 
herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical analogue of a 
30 corresponding naturally occurring amino acid, as well as to naturally occurring amino acid 
polymers. The term also includes variants on the traditional peptide linkage joining the 
amino acids making up the polypeptide. 




The terms "nucleic acid" or "oligonucleotide" or grammatical equivalents 
herein refer to at least two nucleotides covalently linked together. A nucleic acid of the 
present invention is preferably single-stranded or double stranded and will generally contain 
phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are 
5 included that may have alternate backbones, comprising, for example, phosphoramide 
(Beaucage et al. (1993) Tetrahedron 49:1925) and references therein; Letsinger (1970) J. 
Org. Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem. 81: 579; Letsinger et al. (1986) 
NucL Acids Res. 14: 3487; Sawai et al. (1984) Chem. Lett. 805, Letsinger et al (1988) J. Am. 
Chem. Soc. 110: 4470; and Pauwels et al (1986) Chemica Scripta 26: 141 9), 

10 phosphorothioate (Mag et al. (1991) Nucleic Acids Res. 19: 1437; and U.S. Patent No. 
5,644,048), phosphorodithioate (Briu et al. (1989) J. Am. Chem. Soc. 1 1 1 :2321, O- 
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A 
Practical Approach , Oxford University Press), and peptide nucleic acid backbones and 
linkages (see Egholm (1992) J. Am. Chem. Soc. 1 14:1895; Meier et al (1992) Chem. Int. Ed. 

15 Engl 31: 1008; Nielsen (1993) Nature, 365: 566; Carlsson et al. (1996) Nature 380: 207). 
Other analog nucleic acids include those with positive backbones (Denpcy et al (1995) 
Proc. Natl Acad. Sci. USA 92: 6097; non-ionic backbones (U.S. Patent Nos. 5,386,023, 
5,637,684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem, Intl Ed. English 30: 
423; Letsinger et al. (1988) J. Am. Chem. Soc. 1 10:4470; Letsinger et al (1994) Nucleoside 

20 & Nucleotide 13:1597; Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate 

Modifications in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook; Mesmaeker et al 
(1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et al (1994) J. Biomolecular NMR 
34:17; Tetrahedron Lett. 2>1:143 (1996) and non-ribose backbones, including those described 
in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 

25 580, Carbohydrate Modifications in Antisense Research, Ed. Y.S. Sanghui and P. Dan Cook. 
Nucleic acids containing one or more carbocyclic sugars are also included within the 
definition of nucleic acids (see Jenkins et al (1995), Chem. Soc. Rev. pp 169- 176). Several 
nucleic acid analogs are described in Rawls, C & E News June 2, 1997 page 35. These 
modifications of the ribose-phosphate backbone may be done to facilitate the addition of 

30 additional moieties such as labels, or to increase the stability and half-life of such molecules 
in physiological environments. 

The term "heterologous" as it relates to nucleic acid sequences such as coding 
sequences and control sequences, denotes sequences that are not normally associated with a 




region of a recombinant construct, and/or are not normally associated with a particular cell. 
Thus, a "heterologous" region of a nucleic acid construct is an identifiable segment of 
nucleic acid within or attached to another nucleic acid molecule that is not found in 
association with the other molecule in nature. For example, a heterologous region of a 
5 construct could include a coding sequence flanked by sequences not found in association 
with the coding sequence in nature. Another example of a heterologous coding sequence is a 
construct where the coding sequence itself is not found in nature (e.g., synthetic sequences 
having codons different from the native gene). Similarly, a host cell transformed with a 
construct which is not normally present in the host cell would be considered heterologous for 

10 purposes of this invention. 

A "coding sequence" or a sequence which "encodes" a particular polypeptide 
{e.g. a. PKS, an NRPS, etc.), is a nucleic acid sequence which is ultimately transcribed and/or 
translated into that polypeptide in vitro and/or in vivo when placed under the control of 
appropriate regulatory sequences. In certain embodiments, the boundaries of the coding 

15 sequence are determined by a start codon at the 5' (amino) terminus and a translation stop 
codon at the 3' (carboxy) terminus. A coding sequence can include, but is not limited to, 
cDNA from procaryotic or eucaryotic mRNA, genomic DNA sequences from procaryotic or 
eucaryotic DNA, and even synthetic DNA sequences. In preferred embodiments, a 
transcription termination sequence will usually be located 3' to the coding sequence. 

20 Expression "control sequences" refers collectively to promoter sequences, 

ribosome binding sites, polyadenylation signals, transcription termination sequences, 
upstream regulatory domains, enhancers, and the like, which collectively provide for the 
transcription and translation of a coding sequence in a host cell. Not all of these control 
sequences need always be present in a recombinant vector so long as the desired gene is 

25 capable of being transcribed and translated. 

"Recombination" refers to the reassortment of sections of DNA or RNA 
sequences between two DNA or RNA molecules. "Homologous recombination" occurs 
between two DNA molecules which hybridize by virtue of homologous or complementary 
nucleotide sequences present in each DNA molecule. 

30 The terms "stringent conditions" or "hybridization under stringent conditions" 

refers to conditions under which a probe will hybridize preferentially to its target 

subsequence, and to a lesser extent to, or not at all to, other sequences. "Stringent 

hybridization" and "stringent hybridization wash conditions" in the context of nucleic acid 
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hybridization experiments such as Southern and northern hybridizations are sequence 
dependent, and are different under different environmental parameters. An extensive guide 
to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in 
Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes part I chapter 
5 2 Overview of principles of hybridization and the strategy of nucleic acid probe assays, 
Elsevier, New York. Generally, highly stringent hybridization and wash conditions are 
selected to be about 5°C lower than the thermal melting point (T m ) for the specific sequence 
at a defined ionic strength and pH. The T m is the temperature (under defined ionic strength 
and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very 

10 stringent conditions are selected to be equal to the T m for a particular probe. 

An example of stringent hybridization conditions for hybridization of 
complementary nucleic acids which have more than 100 complementary residues on a filter 
in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42°C, with the 
hybridization being carried out overnight. An example of highly stringent wash conditions is 

15 0.15 M NaCl at 72°C for about 15 minutes. An example of stringent wash conditions is a 
0.2x SSC wash at 65°C for 15 minutes {see, Sambrook et al (1989) Molecular Cloning - A 
Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor 
Press, NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a 
low stringency wash to remove background probe signal. An example medium stringency 

20 wash for a duplex of, e.g., more than 100 nucleotides, is lx SSC at 45°C for 15 minutes. An 
example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4-6x SSC at 
40°C for 15 minutes. In general, a signal to noise ratio of 2x (or higher) than that observed 
for an unrelated probe in the particular hybridization assay indicates detection of a specific 
hybridization. Nucleic acids which do not hybridize to each other under stringent conditions 

25 are still substantially identical if the polypeptides which they encode are substantially 

identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum 
codon degeneracy permitted by the genetic code. 

Expression vectors are defined herein as nucleic acid sequences that are direct 
the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in 

30 an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of 

hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression 

vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically 

designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA 
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between hosts, such as bacteria-yeast or bacteria-animal cells. An appropriately constructed 
expression vector preferably contains: an origin of replication for autonomous replication in 
a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally 
one or more constitutive or inducible promoters. In preferred embodiments, an expression 
5 vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS 
and/or NRPS domains and/or modules is operably linked to suitable control sequences 
capable of effecting the expression of the products of these synthase and/or synthetases in a 
suitable host. Control sequences include a transcriptional promoter, an optional operator 
sequence to control transcription and sequences which control the termination of 

10 transcription and translation, and so forth. 

The term "conservative substitution" is used in reference to proteins or 
peptides to reflect amino acid substitutions that do not substantially alter the activity 
(specificity or binding affinity) of the molecule. Typically conservative amino acid 
substitutions involve substitution one amino acid for another amino acid with similar 

15 chemical properties (e.g. charge or hydrophobicity). The following six groups each contain 
amino acids that are typical conservative substitutions for one another: 1) Alanine (A), 
Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), 
Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), 
Valine (V); and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). 

20 The "group consisting of ORF-1 through ORF 42" refers to the group 

consisting of ORF -7, ORF -6, ORF -5, ORF -4, ORF -3, ORF -2, ORF -1, ORF 0, ORF 1, 
ORF 2, ORF 3, ORF 4, ORF 5, ORF 6, ORF 7, ORF 81, ORF 1, ORF 2, ORF 3, ORF 4, 
ORF 5, ORF 6, ORF 7, ORF 8, ORF 9, ORF 10, ORF 11, ORF 12, ORF 13, ORF 14, ORF 
15, ORF 16, ORF 17, ORF 18, ORF 19, ORF 20, ORF 21, ORF 22, ORF 23, ORF 24, ORF 

25 25, ORF 26, ORF 27, ORF 28, ORF 29, ORF 30, ORF 31, ORF 32, ORF 33, ORF 34, ORF 
35, ORF 36, ORF 37, ORF 38, ORF 39, ORF 40, ORF 41, and ORF 42 as identified in 
Tables I and II. In certain embodiments ORF 9 (cagA) is excluded. 

A "biological molecule that is a substrate for a polypeptide encoded by a 
enediyne (e.g., C-1027) biosynthesis gene" refers to a molecule that is chemically modified 

30 by one or more polypeptides encoded by open reading frame(s) of the C-1027 biosynthesis 
gene cluster. The "substrate" may be a native molecule that typically participates in the 
biosynthesis of an enediyne, or can be any other molecule that can be similarly acted upon 
by the polypeptide. 
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A "polymorphism" is a variation in the DNA sequence of some members of a 
species. A polymorphism is thus said to be "allelic," in that, due to the existence of the 
polymorphism, some members of a species may have the unmutated sequence (Le. the 
original "allele") whereas other members may have a mutated sequence (i.e. the variant or 
5 mutant "allele"). In the simplest case, only one mutated sequence may exist, and the 
polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three 
genotypes are possible. They can be homozygous for one allele, homozygous for the other 
allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or 
the other, thus only two genotypes are possible. The occurrence of alternative mutations can 

10 give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotide(s) 
that comprise the mutation. 

"Single nucleotide polymorphism" or "SNPs are defined by their 
characteristic attributes. A central attribute of such a polymorphism is that it contains a 
polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of 

15 the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No. 

08/145,145). Methods of identifying SNPs are well known to those of skill in the art (see, 
e.g., U.S. Patent 5,952,174). 

Abbreviations used herein include LB, Luria-Bertani; NGDH, dNDP-glucose 
4,6-dehydratase ; nt, nucleotide; ORF, open reading frame; PCR, polymerase chain reaction; 

20 PEG, polyethyleneglycol; PKS, polyketide synthase; RBS, ribosomal binding site; Apr, 
apramycin; R, resistant; Th, thiostrepton; WT, wild-type; and TS, temperature sensitive 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates the structures of C-1027 chromophore and the benzenoid 
diradical intermediate proposed to initiate DNA cleavage. 
25 Figure 2 illustrates a scheme using C-1027 open reading frame polypeptides 

for the synthesis of deoxysugars. 

Figure 3 A illustrates a scheme using C-1027 open reading frame polypeptides 
for the synthesis of a p-amino acid. 

Figure 3B illustrates a scheme using C-1027 open reading frame polypeptides 
30 for the synthesis of a benzoxazolinate. 



-13- 



Figure 4 illustrates the synthesis of the enediyne core and final assembly of 

the C-1027 enediyne. 

Figures 5 A, 5B 5 and 5C illustrate the organization of the C-1027 enediyne 
biosynthetic gene cluster. Figure 5 A shows a restriction map of the 75-kb sgc gene cluster 
5 from S. globisporus as represented by three cosmid clones. Figure 5B illustrates the genetic 
organization of the sgcA, sgcB, and cagA genes, showing that they are clustered in the sgc 
gene cluster. Probe 1, the 0.55-kb dNDP-glucose 4,6-dehydratase gene fragment from 
pBS1002. Probe 2, the 0.73-kb cagA fragment from pBS1003. A, Apal; B, BamHl; E, 
EcoRI; K, Kpnl, S, Sacll; Sp, Sphl. Figure 5C shows the genetic organization of the C-1027 
10 biosynthesis gene cluster. 

^J^(^y^ FiguA 6 shows the DNA and deduced amino acid sequences of the 3.0-kb 

BamUl fragment fronVpBS1007, showing the sgcA and sgcB genes. Possible RBSs are 
boxed. The presumed rranslational start and stop sites are in boldface. Restriction enzyme 
sites of interest are underlined. The amino acids, according to which the degenerated PCR 
15 primer were designed for amplifying the dNDP-glucose 4,6-dehydratase gene from S. 
globisporus, are underlined) 

Figure 7 show\ the amino acid sequence alignment of SgcA with three other 
dNDP-glucose 4,6-dehydratasas. Gdh, TDP-glucose 4,6-dehydratase of S. erythraea 
(AAA6821 1); MtmE, TDP-glucose 4,6-dehydratase in the mithramycin pathway of S. 
20 argillaceus (CAA71847); TylA2,YDP-glucose 4,6-dehydratase in the tylosin pathway of S. 
fradiae (S49054). Given in parentheses are protein accession numbers. The apa fold with 
the NAD + -binding motif of GxGxxG\s boxed. 

Figures 8 A and 8B show disruption of sgc A by single crossover homologous 
recombination. Figure 8A shows construction of sgcA disruption mutant and restriction 
25 maps of the wild-type S. globisporus C-1027 and S. globisporus SB 1001 mutant strains 
showing predicted fragment sizes upon BamHl digestion. Figures 8B and 8C show a 
Southern analysis of S. globisporus C-1027 (lane 1) and S. globisporus SB 1001 (lanes 2, 3, 
and 4, three individual isolates) genomic DNA, digested with BamHl, using (Figure 8B) 
pOJ260 vector or (Figure 8C) the 0. 75-kb Sacll/Kpnl fragment of sgcA from pBS1012 as a 
30 probe, respectively. B, BamHl; K, Kpnl; S, Sacll. 

Figures 9 A, 9B, 9B, and 9D illustrate the determination of C-1027 production 
in various S. globisporus strains by assaying their antibacterial activity against M. luteus. 
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Figure 9A:1, S. globisporusC-1027; 2,3, and 4, S. globisporus SB1001 (three individual 
isolates); 5, S. globisporus AF67; 6, S. globisporus AF40. Figure 9B: 1, S. globisporus C- 
1027; 2, S. globisporus SB1001 (pWHM3); 3 and 4, S. globisporus SB1001 (pBS1015) (two 
individual isolates). Both S. globisporus SB 1001 (pWHM3) and S. globisporus SB 1001 
5 (pBS1015) were grown in the presence of 5 Mg/mL thiostrepton. Figure 9C: 1, S. 

globisporusC-1027; 2, S. globisporus SB 1001 (pBS1015); 3. S. globisporus SB 1001; 4. S. 
globisporus SB1001 (pWHM3); 5. S. globisporus AF40; 6. S. globisporus AF44. All S. 
globisporus strains were grown in the absence of thiostrepton. Figure 9D: l.S. globisporus 
(pKCl 139); 2. S. globisporus (pBS1018). 

1 0 DETAILED DESCRIPTION 

This invention provides a complete gene cluster regulating the biosynthesis of 
C-1027, the most potent member of the enediyne antitumor antibiotic family. C-1027 is 
produced by Streptomyces globisporus C-1027 and consists of an apoprotein (encoded by the 
cagA gene) and a non-peptidic chromophore. The C-1027 chromophore could be viewed as 
1 5 being derived biosynthetically from a benzoxazolinate, a deoxyamino hexose, a p-amino 
acid, and an enediyne core. Adopting a strategy to clone the C-1027 biosynthesis gene 
cluster by mapping a putative dNDP-glucose 4,6-dehydratase (NGDH) gene to cagA, we 
localized 75 kb contiguous DNA from S. globisporus encoding a complete C-1027 gene 
cluster. 

20 Initial sequencing of the cloned gene cluster revealed two genes, sgcA and 

sgcB, that encode an NGDH enzyme and a transmembrane efflux protein, respectively, and 
confirmed that the cagA gene resides approximately 14 kb upstream of the sgcA,B locus. 
The involvement of the cloned gene cluster in C-1027 biosynthesis was demonstrated by 
disrupting the sgcA gene to generate C-1027-nonproducing mutants and by complementing 

25 the sgcA mutants in vivo to restore C-1027 production. 

Subsequent DNA sequence analysis provided the complete enediyne C-1027 
gene cluster sequence (SEQ ID NOs: 1 and 2) revealing 50 open reading frames which are 
summarized in Tables I and II. These results represent the first cloning of a gene cluster for 
enediyne anti-tumor antibiotic biosynthesis. 

30 



-15- 



Table I. Summary of the C-1027 gene cluster open reading frames. Table 1. C-1027 gene 
cluster open reading frames (-7 to 26), primers for ORF amplification, and proposed 
functions 



orf # Size Relative 
position 



Primers 



Function 



Seq 
ID 
No. 



orf - 
(-7) 



orf- 
(-6) 



orf- 
(-5) 



orf - 
(-4) 

orf - 
(-3) 

orf - 
(-2) 



orf - 
(-1) 

orf - 
0 

orf- 
1 

orf - 

2 

orf - 
3 



orf - 
4 



orf - 
5 

orf - 
6 

orf - 



648 
bp 



Fwd: ATG GGC ATG ACG GGT 
/N/dVd) ReV: CTA GAG GAT ccc GGG 



549 1478- 
bp 930 



1065 
bp 



387 
bp 

1530 
bp 

3027 
bp 



2328 
bp 

1368 
bp 

999 
bp 

1566 
bp 

1311 
bp 



1584 
bp 



bp 



1272 



735 



2713- 
1649 



3238- 
2851 

4971- 
3442 

5982- 
7478 



9900- 
7573 

11349- 
9982 

28590- 
29588 

29632- 
31197 

31280- 
32590 



32809- 
34392 



35274- 
34458 

17924- 
16653 

16653- 



Fwd: ATG CCG CGG ATT CCC 
Rev: TCA GCT GTC GAT GTC 



Fwd: ATG ACC ATC GCC ACT 
Rev: TCA GAG GCC GAG CAC 



Fwd: ATG AGC TCG CTA CTG 

Rev: CTA GGA GCC GGT CGC 

Fwd: ATG AGC AGC AGC GCC 

Rev: TCA TTC GTC GGC TGC 

Fwd: GTG AGG GCT CTG CCG 

Rev: TCA GAC GGC GGA GGG 



Fwd: GTG AGC GTC ACC GAC 
Rev: TCA ACC CGC CCT GCG 

Fwd: ATG AGG ATG CTG GTG 
Rev: GTG GCT GTG CTC GCA 

Fwd: ATG AGG ATG CTG GTG 
Rev: TCA GCC GAC GGC GTC 

Fwd: GTG ACA GCA GTC AAG 
Rev: TCA TGT GGC CGG TTG 

Fwd: GTG GAG TAC TGG AAC 
Rev: TCA GGC CTG AGG GGC 



Fwd: GTG CCC CAC GGT GCA 
Rev: CTA CAG CCC TCC GAG 



Fwd: ATG TCT TCA ACC CGT 

Rev: TCA GCC GCG CAG GAA 

Fwd: ATG CTG GAG AAA TGC 

Rev: TCA GAC GAG CTC CTT 

Fwd: ATG GAG TAC GGC CCC 



very weak 3 

homology to 4 

putative 

hydroxylase 

Viral 5 

infectivity 6 

potentiator 

protein 

N-truncated 7 

Methionine 8 

synthase (likely 

psuedogene) 

Viral 9 

transcription 10 

factor 

Viral Homolog 11 

possibly primase 12 

Glycerol- 13 

Phosphate ABC 14 
Transporter 
( SnoX drug 
resistance) 

UvrA-like drug 15 

resistance pump 16 

Na + /H + efflux 17 

pump 1 8 

dNTP-glucose 19 

dehydratase 2 0 

Transmembrane 21 

efflux protein 22 

Coenzyme F390 23 

synthase 24 
phenylacetyl -CoA 
ligase 

phenol 2 5 

hydroxylase 2 6 
chlorophenol -4 - 
monooxygenase 

citrate 27 

transport 2 8 
protein 

C-methyl 29 

transferase 30 

hydroxylase 

N- 31 
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orf- 
8 

orf- 
9 

orf - 
10 

orf - 
11 



orf- 
12 

orf- 
13 

orf - 
14 



orf- 
15 

orf- 
16 

orf - 
17 

orf - 
18 

orf - 
19 

orf - 
20 

orf- 
21 

orf - 
22 

orf - 
23 

orf - 
24 

orf - 
25 

orf - 
26 



bp 

1233 
bp 

432 
bp 

1068 
bp 

1485 
bp 



579 
bp 

1137 
bp 

1455 
bp 



1482 
bp 

663 
bp 

1161 
bp 

423 
bp 

1380 
bp 

1356 
bp 

672 
bp 

816 
bp 

1380 
bp 

1620 
bp 

1560 
bp 

282 
bp 



15919 

15922- 
14690 

14643- 
14212 

13012- 
14079 

12835- 
11351 



Rev: TCA TGC CGT GCG CAC 



Fwd: 
Rev : 

Fwd: 
Rev : 

Fwd : 
Rev : 

Fwd: 
Rev : 



ATG AGC 

TCA CCT 

ATG TCG 

TCA GCC 

ATG AAG 

TCA GGC 

GTG GAC 

TCA GGA 



GGC GGC CCG 
CGC CGG ACG 

TTA CGT CAC 
GAA GGT CAG 

GCA CTT GTA 
CGC GAT CTC 

GTG TCA GCG 
CCG CGC ACC 



methyl trans f eras 
e 

Aminotransferase 



CagA 



dNTP-glucose 
synthase 

Hydroxylase , 
Halogenase 



2 5 564- Fwd: ATG AAG CCG ATC GGG 

2 4 986 Rev: TCAGGA CGA CTT GTT 

24 7 02- Fwd: ATG CCT TCC CCC TTC 

2 3 566 Rev: TCA GGT GCG CTC GGC 

22 87 8- Fwd: GTG AGA GAC GGC CGG 

21424 Rev: TCA CGT GGT GAT GGC 



214 07- Fwd: ATG ACC GAC CAG TGC 

19 926 Rev: TCA CAG CAA CTC CTC 

199 29- Fwd: GTG AGC TTG TGG TCT 

19267 Rev: TCA GGC CGG TTC GGC 

19191- Fwd: GTG CGT CCC TTC CGT 

18 031 Rev: TCA GCG GAG CGG ACG 

3 5 938- Fwd: ATG CCA GCA CCG ACT 

3 5516 Rev: TCA GTC GTT GCC GCG 

27214- Fwd: ATG CGG GTG ATG ATC 

2 85 93 Rev: TCA TCG GTC CGC CTC 

25815- Fwd: ATG ACC AAG CAC GCC 

2 717 0 Rev: TCA TAC GGC GGC GCC 

2 3 546- Fwd: GTG AGC GCA CAA CTC 
2287 5 Rev: TCA CGG CTG TGC CTG 

3 52 74- Fwd: ATG TCT TCA ACC CGT 
3 4458 Rev: TCA GCC GCG CAG GAA 

3 755 9- Fwd: ATG ACG ACG TCC GAC 

3 8 938 Rev: TCA GGA GGT GAA GGG 

4 0 986- Fwd: ATG GCA TTG ACT CAA 
3 9367 Rev: TCA GCG CAG CTG GAT 

42611- Fwd: ATG ACG CGG CCG GTG 

41052 Rev: TCA GCG GGT GAG CCG 

3 8983- Fwd: GTG TCC ACC GTT TCC 

3 9264 Rev: TCA CTG CGT TCC GGA 



32 

33 
34 

35 
36 

37 
38 

39 
40 



dNTP-4-keto-6- 41 

deoxyglucose 42 
3 , 5-epimerase 

3-0- 43 

acyltransf erase 44 

Coenzyme F-3 90 4 5 

Synthase 4 6 
Phenylacetyl CoA 
Ligase 

Anthranilate 47 

Synthase I 4 8 

Anthranilate 4 9 

Synthase II 50 

epoxide 51 

hydrolase 52 

Unknown 5 3 

54 

glycosyl 55 

transferase 56 

squalene 5 7 

monooxygenase 5 8 

hypothetical Fe- 59 

S flavoprotein 60 

haloacetate 61 

dehalogenase 62 
hydrolase 

peptide 63 

synthetase 64 

Histidine 65 

Ammonia lyase 66 

Type II 67 

adenylation 68 
protein 

Type II peptidyl 6 9 

carrier protein 70 
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Table IK C- 1027 gene cluster open reading frames (27 to 42), primers for ORF 
amplification, and proposed functions 



ORF 


Relative 
Position 






Primers 






Function 


SEQ 

ID 

NO. 


orf - 


439^5-46023 


Fwd 


GTG 


TGC 


CCG 


GTG 


ACA 


GAC 


Antibiotic 


71 


27 




Rev 


TCA 


GCC 


CAC 


GGG 


CTG 


GGA 


Transporter 


72 


orf- 


46167^47171 


Fwd 


GTG 


TTG 


GGC 


GAT 


GAG 


GAC 


Co- 


73 


28 




Rev 


TCA 


GAC 


CGC 


GGA 


CAT 


CTG 


rnet hyl transfer 


74 




















ase 




orf - - 


47227-4^85 


Fwd 


ATG 


GCC 


GGC 


CTG 


GTC 


ATG 


p4 5 0 


75 


29 




Rev 


TCA 


GGA 


CCC 


GAG 


GGT 


CAC 


hydroxylase 


76 


orf - 


48610-497lV 


Fwd 


GTG 


GAC 


CAG 


ACG 


TCT 


ACG 


Oxidoreductase 


77 



30 

orf - 
31 

orf - 
32 

orf - 
33 

orf - 
34 

orf - 
35 

orf - 
36 

orf - 
37 

orf - 
38 

orf - 
39 

orf - 
40 

orf- 
41 

orf - 
42 



50350-51390 1 



Rev 
Fwd 
Rev 



TCA TGC AGG TGC AGC GTG 
ATG AGG CCG CTC GTT CGG 
TCA TCC CGG CCC GGC GGC 



51420 


-52341 


Fwd : 


ATG 


AGA 


ACG 


CGG 






Ktev : 


TCA 


CGG 


CCG 


GAG 


53241 


-54074 


F^d: 


GTG 


TAT 


CAG 


CCG 






ReV: 


CTA 


CTC 


ATT 


CCA 



Unknown 
Protein 

Oxidoreductase 

Unknown 
Protein 



54230-55379 
56027-56881 
56928-57730 



78 
79 
80 

81 
82 
83 
84 



Fwd 


\ ATG 


TCT 


ACG 


GGC 


TAT 


CTC 


Unknown 


85 


Rev 


Vtca 


GCC 


GCC 


GGT 


GGC 


GCC 


Protein 


86 


Fwd 


Vtg 


TTC 


TCC 


CCC 


GCC 


GCC 


Oxidase/ 


87 


Rev 


TCA 


GTA 


CGC 


CTG 


GTG 


GGC 


Dehydrogenase 


88 


Fwd 


ATG 


AAT 


TCG 


CTC 


GAC 


GAC 


Unknown 


89 


Rev 


: TOI 


^ GCT CCC GGT CGC CGC 


Protein 


90 



57834 


-58304 


Fwd: 


ATG 


ACC 


GCG 


ACG 


AAT 


CCT 


Regulatory 


91 






Rev : 


CTA 


GGC 


GGC 


GCG 


TCC 


CGC 




92 


58440 


-60091 


Fwd : 


ATG 


AGC 


ACC 


ACG 


GCC 


GAG 


Oxidoreductase 


93 






Rev : 


TCA 


GCOL 


GCG 


CGC 


CGA 


CGG 




94 


60092 


-60622 


Fwd : 


ATG 


ACC\ 


lCTG 


GAG 


GCC 


TAC 


Regulatory 


95 






Rev : 


TCA 


TGC 


\gg 


GCT 


CCC 


GGT 




96 


60940 


-62020 


Fwd: 


GTG 


AAA 


aIet 


GAC 


TCT 


GCC 


Regulatory 


97 






Rev: 


TCA 


ACG 


GCG 


AGT 


TGG 


CTG 




98 


62045 


-62899 


Fwd: 


GTG 


ACC 


ACC^ 


AAC 


ACC 


ATC 


Regulatory 


99 






Rev : 


TCA 


CCC 


GCG 


Wc 


TCG 


ATC 




100 


62788 


-63164 


Fwd: 


(partial 


ORF\ 






p450 


101 






Rev: 


TCA 


CCT 


CGC 


CGT 


ACT 


CAC 


hydroxylase 


102 



Surprisingly, sequence analysis failed to reveal any gene that resembles a 
polyketide synthase. The C-1027 open reading frames, however, encode polypeptides 
exhibiting a wide variety of enzymatic activities (e.g., epoxide hydrase, monooxygenase, 
oxidoreductase, P-450 hydroxylase, etc.). The isolated C-1027 gene cluster can be used to 
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synthesize C-1027 enediyne antibiotics and/or analogues thereof. The C-1027 gene cluster 
can be modified and/or augmented to increase C-1027 and/or C-1027 analogue production. 

Alternatively, various components of the C-1027 gene cluster can be used to 
synthesize and/or chemically modify a wide variety of metabolites. Thus, for example, ORF 
5 6 (C-methyltransferase) can be used to methylate a carbon, while ORF 12, an epimerase, can 
be used to change the conformation of a sugar. The ORFs can be combined in their native 
configuration or in modified configurations to synthesize a wide variety of 
biomolecules/metabolites. Thus, for example, various combinations of C-1027 open reading 
frames can be used to synthesize an enediyne core, to synthesize a deoxy sugar, to synthesize 

10 a P-amino acid, to make a benzoxazolinate, etc {see, e.g., Figures 2, 3, and 4). 

The native C-1027 gene cluster ORFs can be re-ordered, modified, and 
combined with other biosynthetic units {e.g. polyketide synthases (PKSs) or catalytic 
domains thereof and/or non-ribosomal polypeptide synthetases (NRPSs) or catalytic domains 
thereof) to produce a wide variety of molecules. Large chemical libraries can be produced 

1 5 and then screened for a desired activity. 

The C-1027 gene cluster also includes a number of drug resistance genes {see, 
e.g., Table 2) that confer resistance to C-1027 and/or metabolites involved in C-1027 
biosynthesis thereby permitting the cell to complete the enediyne biosynthesis. These 
resistance genes can be used to confer enediyne resistance on a cell lacking such resistance 

20 or to augment the enediyne resistance of a cell that does tolerate enediynes. Such cells can 
be used to produce high levels of enediynes and/or enediyne metabolites, and/or enediyne 
analogues. 



Table III. C-1027 cluster drug resistance genes. 



ORF 


Protein 


Mechanism 


ORF 9 : 


CagA apoprotein 


Drug sequestering 


ORF 2: 


SgcB transmembrane efflux protein 


Drug exporting 


ORF 27 


Transmembrane transport protein 


Drug exporting 


ORF0 


Na + /H + transporter 


Drug exporting 


ORF-1 


ABC transport (C-terminus) 


Drug exporting 


ORF -2 


Glycerol phosphate transporter 


Drug exporting 


ORF-1 


UvrA-like protein (N-terminus) 


DNA repairing 



-19- 





L 



Isolation^ preparation, and expression of C-1027 nucleic acids. 



The C-1027 gene cluster nucleic acids can be isolated, optionally modified, 



and inserted into a host cell to create and/or modify a metabolic (biosynthetic) pathway and 
5 thereby enable that host cell to synthesize and/or modify various metabolites. Alternatively 
the C-1027 gene cluster nucleic acids can be expressed in the host cell and the encoded C- 
1027 polypeptide(s) recovered for use as chemical reagents, e.g. in the ex vivo synthesis 
and/or chemical modification of various metabolites. Either application typically entails 
insertion of one or more nucleic acids encoding one or more isolated and/or modified C-1027 

10 enediyne open reading frames in a suitable host cell. The nucleic acid(s) are typically in an 
expression vector, a construct containing control elements suitable to direct expression of the 
C-1027 polypeptides. The expressed C-1027 polypeptides in the host cell then act as 
components of a metabolic/biosynthetic pathway (in which case the synthetic product of the 
pathway is typically recovered) or the C-1027 polypeptides themselves are recovered. Using 

15 the sequence information provided herein, cloning and expression of C-1027 nucleic acids 
can be accomplished using routine and well known methods. 

A) C-1027 nucleic acids. 

The nucleic acids comprising the C-1027 gene cluster are identified in Tables 
I and are listed in the sequence listing provided herein. In particular, Table 1 identifies genes 

20 and functions of open reading frames (ORFs) in the C-1027 enediyne biosynthesis gene 
cluster and identifies primers suitable for the amplification/isolation of any one or more of 
the C-1027 open reading frames. Of course, using the sequence information provided herein, 
other primers suitable for amplification/isolation of one or more C-1027 open reading frames 
can be determined according to standard methods well known to those of skill in the art (e.g. 

25 using Vector NTI Suite™, InforMax, Gaithersberg, MD, USA). 



containing the requisite genes (e.g. Streptomyces globisporus) as a template. Typical 
amplification conditions include the following PCR temperature program: initial denaturing 
at 94°C for 5 min, 24-36 cycles of 45 sec at 94°C 5 1 min at 60°C, 2 min at 72°C, followed by 
30 additional 7 min at 72°C, One of skill will appreciate that optimization of such a protocol, 
e.g. to improve yield, etc. is routine (see, e.g., U.S. Patent No. 4,683,202; Innis (1990) PCR 



Typically such amplifications will utilize the DNA or RNA of an organism 
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Protocols A Guide to Methods and Applications Academic Press Inc. San Diego, CA, etc). 
In addition, primer may be designed to introduce restriction sites and so facilitate cloning of 
the amplified sequence into a vector. 

In one embodiment, this invention provides nucleic acids for the recombinant 
expression of an enediyne {e.g. a C-1027 enediyne or an analogue thereof). Such nucleic 
acids include isolated gene cluster(s) comprising open reading frames encoding polypeptides 
sufficient to direct the assembly of the enediyne. In other embodiments of this invention, the 
C-1027 open reading frames may be unchanged, but the control elements {e.g. promoters, 
enhancers, etc.) may be modified. In still other embodiments, the nucleic acids may encode 
selected components {e.g. one or more C-1027 or modified C-1027 open reading frames) 
and/or may optionally contain other heterologous biosynthetic elements including, but not 
limited to polyketide synthase (PKS) and/or non-ribosomal polypeptide synthetase (NRPS) 
modules or enzymatic domains. 

Such variations may be introduced by design, for example to modify a known 
molecule in a specific way, e.g. by replacing a single substituent of the enediyne with 
another, thereby creating a derivative enediyne molecule of predicted structure. 
Alternatively, variations can be made randomly, for example by making a library of 
molecular variants of a known enediyne by systematically or haphazardly replacing one or 
open reading frames in the biosynthetic pathway. Production of alternative/modified 
enediyne, and hybrid enediyne PKSs and/or NRPSs and hybrid systems is described below. 

Using the information provided herein other approaches to cloning the desired 
sequences will be apparent to those of skill in the art. For example, the enediyne, and/or 
optionally PKS and/or NRPS modules or enzymatic domains of interest can be obtained 
from an organism that expresses such, using recombinant methods, such as by screening 
cDNA or genomic libraries, derived from cells expressing the gene, or by deriving the gene 
from a vector known to include the same. The gene can then be isolated and combined with 
other desired biosynthetic elements using standard techniques. If the gene in question is 
already present in a suitable expression vector, it can be combined in situ, with, e.g., other 
PKS subunits, as desired. The gene of interest can also be produced synthetically, rather 
than cloned. The nucleotide sequence can be designed with the appropriate codons for the 
particular amino acid sequence desired. In general, one will select preferred codons for the 
intended host in which the sequence will be expressed. The complete sequence can be 

assembled from overlapping oligonucleotides prepared by standard methods and assembled 
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into a complete coding sequence {see, e.g., Edge (1981) Nature 292:756; Nambair et al 
(1984) Science 223: 1299; Jay et al (1984) J. Biol Chem. 259:631 1). In addition, it is noted 
that custom gene synthesis is commercially available (see, e.g. Operon Technologies, 
Alameda, CA). 

5 Examples of such techniques and instructions sufficient to direct persons of 

skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to 
Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San 
Diego, CA (Berger); Sambrook et al (1989) Molecular Cloning - A Laboratory Manual (2nd 
ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19 
10 1994) Current Protocols in Molecular Biology, Current Protocols, a joint venture between 
Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Patent 5,017,478; and 
European Patent No. 0,246,864. 

B) Expression of f C-1027 open reading frames. 

The choice of expression vector depends on the sequence(s) that are to be 
15 expressed. Any transducible cloning vector can be used as a cloning vector for the nucleic 
acid constructs of this invention. However, where large clusters are to be expressed, it 
phagemids, cosmids, Pis, YACs, BACs, PACs, HACs or similar cloning vectors be used for 
cloning the nucleotide sequences into the host cell. Phagemids, cosmids, and BACs, for 
example, are advantageous vectors due to the ability to insert and stably propagate therein 
20 larger fragments of DNA than in Ml 3 phage and lambda phage, respectively. Phagemids 
which will find use in this method generally include hybrids between plasmids and 
filamentous phage cloning vehicles. Cosmids which will find use in this method generally 
include lambda phage-based vectors into which cos sites have been inserted. Recipient pool 
cloning vectors can be any suitable plasmid. The cloning vectors into which pools of 
25 mutants are inserted may be identical or may be constructed to harbor and express different 
genetic markers (see, e.g., Sambrook et al., supra). The utility of employing such vectors 
having different marker genes may be exploited to facilitate a determination of successful 
transduction. 

In preferred embodiments of this invention, vectors are used to introduce C- 
30 1027 biosynthesis genes or gene clusters into host (e.g. Streptomyces) cells. Numerous 
vectors for use in particular host cells are well known to those of skill in the art. For 
example described in Malpartida and Hopwook, (1984) Nature, 309:462-464; Kao et al., 
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(1994), Science, 265: 509-512; and Hopwood et al, (1987) Methods EnzymoL, 153: 1 16-166 
all describe vectors for use in various Streptomyces hosts. 

In one preferred embodiment, Streptomyces vectors are used that include 
sequences that allow their introduction and maintenance in E. coli. Such Streptomyces! E. 
5 coli shuttle vectors have been described {see, for example, Vara et al, (1989) J. Bacteriol, 
171:5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl. Acad. Sci. USA, 88: 8553-8557.) 

The wildtype and/or modified C-1027 enediyne open reading frame(s) of this 
invention, can be inserted into one or more expression vectors, using methods known to 
those of skill in the art. Expression vectors will include control sequences operably linked to 

10 the desired open reading frame. Suitable expression systems for use with the present 
invention include systems that function in eucaryotic and/or prokaryotic host cells. 
However, as explained above, prokaryotic systems are preferred, and in particular, systems 
compatible with Streptomyces spp. are of particular interest. Control elements for use in 
such systems include promoters, optionally containing operator sequences, and ribosome 

15 binding sites. Particularly useful promoters include control sequences derived from 
enediyne, and/or PKS, and/or NRPS gene clusters. Other promoters (e.g. ermE* as 
illustrated in Example 1) are also suitable. Other bacterial promoters, such as those derived 
from sugar metabolizing enzymes, such as galactose, lactose (lac) and maltose, will also find 
use in the present constructs. Additional examples include promoter sequences derived from 

20 biosynthetic enzymes such as tryptophan (trp), the beta -lactamase (bid) promoter system, 

bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter 
(U.S. Patent 4,55 1,433), which do not occur in nature also function in bacterial host cells. In 
Streptomyces, numerous promoters have been described including constitutive promoters, 
such as ErmE and TcmG (Shen and Hutchinson, (1994) J. Biol. Chem. 269: 30726-30733), 

25 as well as controllable promoters such as actl and actlll (Pleper et al, (1995) Nature, vol. 
378: 263-266; Pieper et al, (1995) J. Am. Chem. Soc, 1 17: 1 1373-1 1374; and Wiesmann et 
al, (1995) Chem. & Biol 2: 583-589). 

Other regulatory sequences may also be desirable which allow for regulation 
of expression of the enediyne open reading frame(s) relative to the growth of the host cell. 

30 Regulatory sequences are known to those of skill in the art, and examples include those 

which cause the expression of a gene to be turned on or off in response to a chemical or 

physical stimulus, including the presence of a regulatory compound. Other types of 

regulatory elements may also be present in the vector, for example, enhancer sequences. 
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Selectable markers can also be included in the recombinant expression 
vectors. A variety of markers are known which are useful in selecting for transformed cell 
lines and generally comprise a gene whose expression confers a selectable phenotype on 
transformed cells when the cells are grown in an appropriate selective medium. Such 
5 markers include, for example, genes that confer antibiotic resistance or sensitivity to the 
plasmid. 

The various enediyne cluster open reading frames, and/or PKS, and/or NRPS 
clusters or subunits of interest can be cloned into one or more recombinant vectors as 
individual cassettes, with separate control elements, or under the control of, e.g., a single 

10 promoter. The various open reading frames can include flanking restriction sites to allow for 
the easy deletion and insertion of other open reading frames so that hybrid synthetic 
pathways can be generated. The design of such unique restriction sites is known to those of 
skill in the art and can be accomplished using the techniques described above, such as site- 
directed mutagenesis and PCR. 

15 Methods of cloning and expressing large nucleic acids such as gene clusters, 

including PKS- or NRPS-encoding gene clusters, in cells including Streptomyces are well 
known to those of skill in the art (see, e.g., Stutzman-Engwall and Hutchinson (1989) Proc. 
Natl. Acad. Sci. USA, 86: 3135-3139; Motamedi and Hutchinson (1987) Proc. Natl Acad. 
Sci. USA, 84: 4445-4449; Grim et al. (1994) Gene, 151: 1-10; Kao et al. (1994) Science, 

20 265: 509-512; and Hopwood et al (1987) Meth. EnzymoL, 153: 1 16-166). In some 
examples, nucleic acid sequences of well over lOOkb have been introduced into cells, 
including prokaryotic cells, using vector-based methods (see, for example, Osoegawa et al, 
(1998) Genomics, 52: 1-8; Woon et al, (1998) Genomics, 50: 306-316; Huang et al, (1996) 
Nucl Acids Res., 24: 4202-4209). In addition, the cloning and expression of C-1027 

25 enediyne is illustrated in Example 1 . 

C) Host cells. 

The vectors described above can be used to express various protein 
components of the enediyne, and/or enediyne shunt metabolites, and/or other modified 
metabolites for subsequent isolation and/or to provide a biological synthesis of one or more 
30 desired biomolecules (e.g. C-1027 and/or a C-1027 analogue, etc.). Where one or more 
proteins of the enediyne biosynthetic gene cluster are expressed (e.g. overexpressed) for 
subsequent isolation and/or characterization, the proteins are expressed in any prokaryotic or 
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eukaryotic cell suitable for protein expression. In one preferred embodiment, the proteins 
are expressed in E, coli. 

Host cells for the recombinant production of the subject enediynes, enediyne 
metabolites, shunt metabolites, etc. can be derived from any organism with the capability of 
harboring a recombinant enediyne gene cluster and/or subset thereof. Thus, the host cells of 
the present invention can be derived from either prokaryotic or eucaryotic organisms. 
Preferred host cells are those of species or strains (e.g. bacterial strains) that naturally 
express enediynes. Such host cells include, but are not limited to Actinomycetes, 
Actinoplanetes, and Streptomycetes , Actinomadura, Micromonospra, and the like. 
Particularly preferred host cells include, but are not limited to Streptomyces globisporus, 
Streptomyces lividans, Streptomyces coelicolor, Micromonospora echinospora spp. 
calichenisis, Actinomadura verrucosopora, Micromonospora chersina, Streptomyces 
carzinostaticus, and Actinomycete L585-6. Other suitable host cells include, but are not 
limited to S. verticillis S. ambofaciens, S. avermitilis, S. azureus, S. cinnamonensis, S. 
coelicolor, S. curacoi, S. erythraeus, S.fradiae, S. galilaeus, S. glaucescens, S. 
hygroscopicus, S. lividans, S. parvulus, S. peucetius, S. rimosus, S. roseofulvus, S. 
thermotolerans, and S. violaceoruber (see, e.g., Hopwood and Sherman (1990) Ann. Rev. 
Genet. 24: 37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited, 
etc.). 

In certain embodiments, a eukaryotic host cell is preferred (e.g. where certain 
glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of 
skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells, 
plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and 
various myeloma cell lines). 

D) Recovery of the expression product. 

Recovery of the expression product (e.g., enediyne, enediyne analogue, 
enediyne biosynthetic pathway polypeptide, etc.) is accomplished according to standard 
methods well known to those of skill in the art. Thus, for example where enediyne 
biosynthetic gene cluster proteins are to be expressed and isolated, the proteins can be 
expressed with a convenient tag to facilitate isolation (e.g. a His 6 ) tag. Other standard 
protein purification techniques are suitable and well known to those of skill in the art (see, 
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e.g., (Quadri et al. (1998) Biochemistry 37: 1585-1595; Nakano et al. (1992) Mol. Gen. 
Genet. 232: 313-321, etc.). 

Similarly where components {e.g. enediyne biosynthetic cluster orfs) are used 
to synthesize and/or modify various biomolecules {e.g. enediynes, enediyne analogues, shunt 
5 metabolites, etc.) the desired product and/or shunt metabolite(s) are isolated according to 
standard methods well know to those of skill in the art {see, e.g., Carreras and Khosla (1998) 
Biochemistry 37: 2084-2088, Deutscher (1990) Methods in Enzymology Volume 182: Guide 
to Protein Purification, M. Deutscher, ed. etc.). 

II. Use of C-1027 open reading frames in directed biosynthesis. 

10 Elements {e.g. open reading frames) of the C-1027 biosynthetic gene cluster 

and/or variants thereof can be used in a wide variety of "directed" biosynthetic processes {i.e. 
where the process is designed to modify and/or synthesize one or more particular preselected 
metabolite(s)). Essentially the entire C-1027 gene cluster can be used to synthesize a C-1027 
enediyne and/or a C-1027 enediyne analogue. Individual C-1027 cluster open reading 

15 frames can be used to perform chemically modifications on particular substrates and/or to 

synthesize various metabolites. Thus, for example, ORF 6 (C-methyltransferase can be used 
to methylate a carbon), while ORF 7 (N-methyltransferase) can be used to methylate a 
nitrogen. ORF 12, and epimerase, can be used to change the conformation of a sugar, and 
ORF 8 (an amino transferase) can be used to aminate a suitable substrate. Similarly, 

20 combinations of C-1027 open reading frames can be used to direct the synthesis of various 
metabolites {e.g. p-amino acids, deoxysugars, benzoxazolinates, and the like). These 
examples, are merely illustrative. One of skill in the art, utilizing the information provided 
here, can perform literally countless chemical modifications and/or syntheses using either 
"native" enediyne biosynthesis metabolites as the substrate molecule, or other molecules 

25 capable of acting as substrates for the particular enzymes in question. Other substrates can 
be identified by routine screening. Methods of screening enzymes for specific activity 
against particular substrates are well known to those of skill in the art. 

The biosyntheses can be performed in vivo, e.g. by providing a host cell 
comprising the desired C-1027 gene cluster open reading frames and/or in vivo, e.g., by 

30 providing the polypeptides encoded by the C-1027 gene cluster ORFs and the appropriate 
substrates and/or cofactors. 




-26- 



A) Synthesis of enediynes and enedivne analogues. 

In one embodiment, this invention provides for the synthesis of C- 1027 
enediynes and/or C-1027 analogues or derivatives. In a preferred embodiment, this is 
accomplished by providing a cell comprising a C-1027 gene cluster and culturing the cell 
5 under conditions whereby the desired enediyne or enediyne analogue is synthesized. The 
cell can be a cell that does not normally synthesize an enediyne and the entire gene cluster 
can be transfected into the cell. Alternatively, a cell that typically synthesizes enediynes can 
be utilized and all or part of the C-1027 gene cluster can be introduced into the cell. 

Enediyne derivatives/analogues can be produced by varying the order of, or 
10 kind of, gene cluster subunits present in the cell, and/or by changing the host cell {e.g. to a 
eukaryotic cell that glycosylates the biosynthetic product), and/or by providing altered 
metabolites (e.g. adding exogenous aglycones to a host that carries a gene cassette of the 
deoxysugar biosynthesis and glycosylation genes for the production of glycosylated 
metabolites), etc. 

15 In certain embodiments, the host cell need not be transfected with an entire C- 

1027 gene cluster. Rather, various components of a C-1027 gene cluster can be altered 
within a cell already harboring a C-1027 cluster. By varying or adding various biosynthetic 
open reading frames, C-1027 enediyne variants can be produced. 

The use of standard techniques of molecular biology (gene disruption, gene 

20 replacement, gene supplement) can be used to modulate and/or otherwise alter enediyne 
and/or other metabolite (e.g. shunt metabolite) production in an organism that naturally 
synthesizes an enediyne (e.g. S. globisporus) or an organism that is modified to synthesize an 
enediyne. 

In addition, or alternatively, control sequences that alter the expression of 
25 various open reading frames can be introduced that alter the amount and/or timing of 

enediyne production. Thus, for example, by placing particular C-1027 open reading frames 
under control of a constitutive promoter (ermE*) C-1027 production was increased by as 
much as 4-fold (see, e.g. Table 3 and Example 1). 

30 Table 3. Alteration of C-1027 production by engineering the C-1027 biosynthesis gene 
cluster. 

Strain Yield (%) 



-27- 



WT 100 

WT/pKC1139 100 

WT/*?rm£*/ORF 2 > 1 50 

WT/ORF 9 >100 

WT/erm£*/ORF 9 <10 

WT/ORF 10, 11 >100 

WT/ermE*/OKF 10, 11 >100 

WT/ORF 9, 10, 11 >400 



ORF 2: transmembrane eflux protein; ORF 9: CagA apoprotein; ORF 10: TDP-glucose 
synthase; ORF 11; Hydroxylase/halogenase 

Where enediyne analogues are synthesized, it will often prove desirable to 
5 assay them for biological activity. Such assays are well know to those of skill in the art. 
One such assay is illustrated in Example 1. Briefly, this example depicts an assay of 
antibacterial activity against M. luteus as described by Hu et al (1988) J. AntibioL 41 : 1575- 
1579). Other suitable assays for enediyne activity will be known to those of skill in the art. 

Use of C-1027 open reading frames to synthesize an enediyne core. 

10 The C-1027 open reading frames described herein, or variants thereof, can be 

used to synthesize an enediyene core, e.g., from a fatty acid precursor. One such synthetic 
pathway is illustrated in Figure 4. This reaction scheme utilizes ORF 17 (epoxide hydrase), 
ORF 20 (monooxygenase), ORF 21 (iron-sulfur flavoprotein), ORF 29 (P-450 hydroxylase, 
ORF 30 (oxidoreductase), ORF 32 (oxidoreductase), ORF 35 (proline oxidase), and ORF 38 

15 (P-450 hydroxylase) to synthesize anenediyne core. 

This synthetic pathway, is not considered limiting, but merely illustrative. 
Using this as a model, one of ordinary skill in the art can design numerous other synthetic 
schemes to produce enediyne cores and/or core variants. 

O Use of C-1027 open reading frames to synthesize deoxv sugars. 

20 The biosynthesis of various deoxy sugars {e.g., deoxyhexoses) typically share 

a common key intermediate -4-keto-6-deoxyglucose nucleoside diphosphate or its analogs, 
whose formation from glucose nucleoside diphosphate is catalyzed by the NGDH enzyme, 
an NAD + -dependent oxidoreductase (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223- 
256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl 
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(ed). Marcel Dekker, New York.). Similarly, the C-1027 gene cluster includes an NAGDH 
enzyme which can be exploited to synthesize a variety of deoxy sugars. 

One illustrative synthetic pathway is shown in Figure 2. This biosynthetic 
scheme utilizes ORF 10 (dNDP-glucose synthase), ORF 1 (dNDP-glucose dehydratase), 
5 ORF 12 (epimerase), ORF 8 (aminotransferase), ORF 6 (C-methyl transferase), ORF 7 (N- 
methyltransferase) and ORF 19 (glycosyl transferase). 

This synthetic pathway, is not considered limiting, but merely illustrative. 
Using this as a model, one of ordinary skill in the art can design numerous other synthetic 
schemes to produce various deoxy sugars. 

10 D) Use of C-1027 open reading frames to synthesize B-amino acids. 

In still another embodiment, C-1027 biosynthetic polypeptides can be used in 
the biosynthesis of (3-amino acids. One illustrative synthetic pathway is shown in Figure 3 A. 
This biosynthetic scheme utilizes ORF 4 (hydroxylase), ORF 1 1 (hydroxylase/halogenase), 
ORF 24 (aminomutase), ORF 23 (type II NRPS condensation enzyme), ORF 25 (type II 
15 NRPS adenylation enzyme), and ORF 26 (type II peptidyl carrier protein). 

Again, this synthetic pathway, is not considered limiting, but merely 
illustrative. Using this as a model, one of ordinary skill in the art can design numerous other 
synthetic schemes to produce other beta amino acids. 

E) Use of C-1027 open reading frames to synthesize benzoxazolinates. 

20 The C-1027 open reading frames can also be used to synthesize a 

benzoxazolinate, One illustrative synthetic pathway is shown in Figure 3B. This 
biosynthetic scheme utilizes ORF 1 5 (anthranilate synthase I, ORF 1 6 (anthranilate synthase 
II), ORF 4 (phenol hydroxylase/chlorophenol-4-monooxygenase), ORF 1 1 
(Hydroxylase/Halogenase), ORF 28 (O-methyltransferase), ORF 3 (coenzyme F390 

25 synthetase, ORF 14 (coenzyme F390 synthetase), and ORF 13 (O-acyltransferase). Again, 
this synthetic pathway, is not considered limiting, but merely illustrative. Using this as a 
model, one of ordinary skill in the art can design numerous other synthetic schemes to 
produce other beta amino acids. 
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HI. Generation of chemical diversity. 

In addition to the directed modification and/or biosynthesis of various 
metabolites as described above, the C-1027 biosynthetic gene cluster open reading frames 
can be utilized, by themselves or in combination with other biosynthetic subunits (e.g. NRPS 
5 and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems) to 
produce a wide variety of compounds including, but not limited to various enediyne or 
enediyne derivatives, various polyketides, polypeptides, polyketide/polypeptide hybrids, 
various thiazoles, various sugars, various methylated polypeptides/polyketides, and the like. 

As with the directed production of various metabolites described above, such 
10 compounds can be produced, in vivo or in vitro, by catalytic biosynthesis, e.g., using large, 
enediyne cluster units and/or modular PKSs, NRPSs, and hybrid PKS/NRPS systems. In a 
preferred embodiment large combinatorial libraries of cells harboring various 
megasynthetases can be produced by the random or directed modification of particular 
pathways and then selected for the production of a molecule or molecules of interest. It will 
15 be appreciated that, in certain embodiments, such libraries of megasynthetases/modified 
pathways, can be used to generate large, complex combinatorial libraries of compounds 
which themselves can be screened for a desired activity. 

Such combinatorial libraries can be created by the deliberate 
modification/variation of selected biosynthetic pathways and/or by random/haphazard 
20 modification of such pathways. 

A) Directed engineering of novel synthetic pathways. 

In numerous embodiments of this invention, novel polyketides, polypeptides, 
and combinations thereof are created by modifying the entediyne gene cluster ORFs and/or 
known PKSs, and/or NRPSs so as to introduce variations into metabolites synthesized by the 
25 enzymes. Such variations may be introduced by design, for example to modify a known 

molecule in a specific way, e.g. by replacing a single monomeric unit within a polymer with 
another, thereby creating a derivative molecule of predicted structure. Such variations can 
also be made by adding one or more modules or enzymatic domains to a known PKS or 
NRPS or enediyne cluster, or by removing one or more module from a known PKS or 



30 



NRPS. 



Using any of these methods, it is possible to introduce PKS domains, NRPS 
domains, and entediyne domains into a megasynthetase. Mutations can be made to the 
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native enediyne, and/or NRPS, and/or PKS subunit sequences and such mutants used in 
place of the native sequence, so long as the mutants are able to function with other subunits 
(domains) in the synthetic pathway. Such mutations can be made to the native sequences 
using conventional techniques such as by preparing synthetic oligonucleotides including the 
5 mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS 
subunit using restriction endonuclease digestion, {see, e.g., Kunkel, (1985) Proc. Natl. Acad. 
Sci. USA 82: 448; Geisselsoder et ah (1987) BioTechniques 5: 786). Alternatively, the 
mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) 
which hybridizes to the native nucleotide sequence (generally cDNA corresponding to the 

10 RNA sequence), at a temperature below the melting temperature of the mismatched duplex. 
The primer can be made specific by keeping primer length and base composition within 
relatively narrow limits and by keeping the mutant base centrally located (Zoller and Smith 
(1983) Meth, Enzymol. 100: 468). Primer extension is effected using DNA polymerase, the 
product cloned and clones containing the mutated DNA, derived by segregation of the primer 

15 extended strand, selected. Selection can be accomplished using the mutant primer as a 
hybridization probe. The technique is also applicable for generating multiple point 
mutations {see, e.g., Dalbie-McFarland et ah (1982) Proc. Natl. Acad. Sci USA 79:6409). 
PCR mutagenesis will also find use for effecting the desired mutations. 

B) Random modification of enediyne pathways* 

20 In another embodiment, variations can be made randomly, for example by 

making a library of molecular variants {e.g. of a known enediyne) by randomly mutating one 
or more elements of the subject gene cluster or by randomly replacing one or more open 
reading frames in a gene cluster with one or more of alternative open reading frames. 

The various open reading frames can be combined into a single multi-modular 

25 enzyme, thereby dramatically increasing the number of possible combinations obtained using 
these methods. These combinations can be made using standard recombinant or nucleic acid 
amplification methods, for example by shuffling nucleic acid sequences encoding various 
modules or enzymatic domains to create novel arrangements of the sequences, analogous to 
DNA shuffling methods described in Crameri et ah (1998) Nature 391: 288-291, and in U.S. 

30 Patents 5,605,793 and in 5,837,458. In addition, novel combinations can be made in vitro, 
for example by combinatorial synthetic methods. Novel molecules or molecule libraries, can 
be screened for any specific activity using standard methods. 
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Random mutagenesis of the nucleotide sequences obtained as described above 
can be accomplished by several different techniques known in the art, such as by altering 
sequences within restriction endonuclease sites, inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect 
5 nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing 
synthetic mutants or by damaging plasmid DNA in vitro with chemicals. Chemical mutagens 
include, for example, sodium bisulfite, nitrous acid, hydroxylamine, agents which damage or 
remove bases thereby preventing normal base-pairing such as hydrazine or formic acid, 
analogues of nucleotide precursors such as nitrosoguanidine, 5-bromouracil, 2-aminopurine, 

10 or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 

Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. 
coli and propagated as a pool or library of mutant plasmids. 

Large populations of random enzyme variants can be constructed in vivo 
using "recombination-enhanced mutagenesis." This method employs two or more pools of, 

15 for example, 10 6 mutants each of the wild-type encoding nucleotide sequence that are 

generated using any convenient mutagenesis technique, described more fully above, and then 
inserted into cloning vectors. 

C) Incorporation and/or modification of non-C-1027 cluster elements. 

In either the directed or random approaches, nucleic acids encoding novel 
20 combinations of gene cluster ORFs are introduced into a cell. In one embodiment, nucleic 
acids encoding one or more enediyne synthetic cluster ORFS and/or PKS and/or NRPS 
domains are introduced into a cell so as to replace one or more domains of an endogenous 
gene cluster within a cell. Endogenous gene replacement can be accomplished using 
standard methods, such as homologous recombination. Nucleic acids encoding an entire 
25 enediyne, enediyne ORF, PKS, NRPS, or combination thereof can also be introduced into a 
cell so as to enable the cell to produce the novel enzyme, and, consequently, synthesize the 
novel polymer. In a preferred embodiment, such nucleic acids are introduced into the cell 
optionally along with a number of additional genes, together called a 'gene cluster,' that 
influence the expression of the genes, survival of the expressing cells, etc. In a particularly 
30 preferred embodiment, such cells do not have any other enediyne and/or PKS- and/or NRPS- 
encoding genes or gene clusters, thereby allowing the straightforward isolation of the 
molecule(s) synthesized by the genes introduced into the cell. 
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Furthermore, the recombinant vector(s) can include genes from a single 



enediyne and/or PKS and/or NRPS gene cluster, or may comprise hybrid replacement PKS 
gene clusters with, e.g., a gene for one cluster replaced by the corresponding gene from 
another gene cluster. For example, it has been found that ACPs are readily interchangeable 
5 among different synthases without an effect on product structure. Furthermore, a given KR 

r 

can recognize and reduce polyketide chains of different chain lengths. Accordingly, these 
genes are freely interchangeable in the constructs described herein. Thus, the replacement 
clusters of the present invention can be derived from any combination of PKS and/or NRPS 
gene sets that ultimately function to produce an identifiable polyketide. 

10 Examples of hybrid replacement clusters include, but are not limited to, 

clusters with genes derived from two or more of the act gene cluster, the whiE gene cluster, 
frenolicin (fren), granaticin (gra), tetracenomycin (tcm), 6-methylsalicylic acid (6-msas), 
oxytetracycline (otc), tetracycline (tet), erythromycin (ery), griseusin (gris), nanaomycin, 
medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin, 

15 nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a 
discussion of various PKSs, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 
37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited. 



derived from the act, fren, tcm, gris and gra gene clusters (see, e.g., U.S. Patent 5,712,146). 
20 Other hybrid gene clusters, as described above, can easily be produced and screened using 
the disclosure herein, for the production of identifiable polyketides, polypeptides or 
polyketide/polypeptide hybrids. 



collectively encoding a functional PKS/NRPS set, or a cocktail comprising a random 
25 assortment of enediyne ORFs and/or PKS and/or NRPS genes, modules, active sites, or 

portions thereof The vector(s) can include native or hybrid combinations of enediyne ORFs, 
and/or PKS and/or NRPS subunits or cocktail components, or mutants thereof. As explained 
above, the gene cluster need not correspond to the complete native gene cluster but need only 
encode the necessary enediyne ORFs and/or PKS and/or NRPS components to catalyze the 
30 production of the desired product(s). 



A number of hybrid gene clusters have been constructed, having components 



Host cells (e.g. Streptomyces) can be transformed with one or more vectors, 
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IV. Variation of starter and/or extender units, and/or host cells. 

In addition to varying the nucleic acids comprising the subject gene cluster, 
variations in the products produced by the gene cluster(s) can be obtained by varying the the 
host cell, the starter units and/or the extender units. Thus, for example different fatty acids 
5 can be utilized in the enediyne synthetic pathway resulting in different enediyne variants. 
Similarly different intermediate metabolites can be provided (e.g. endogenously produced by 
the host cell, or produced by an introduced herterologous construct, and/or supplied from an 
exogenous source (e.g. the culture media)). Similarly, varying the host cell can vary the 
resulting product(s). For example, a gene cassette carrying the enediyne biosynthesis genes 
10 can be introduced into a deoxysugar-synthesizing host for the production of glycosylated 
enediyne metabolites. 

V. Use of C-1027 resistance genes. 

The antibiotic C-1027 and metabolites present in C-1027 biosynthesis are 
highly potent cytotoxins. Accordingly the biosynthesis of C-1027 is facilitated by the 

15 presence of one or more antibiotic (e.g. enediyne) resistance genes. Without being bound to 
a particular theory, it is believed that CagA and SgcB function cooperatively to provide 
resistance. It is believed that the C-1027 chromophore is first sequestered by binding to the 
preaproprotein CagA (ORF 9) to form a complex, which is then transported out of the cell by 
the efflux pump SgcB (ORF 2) and processed by removing the leader peptide to yield the 

20 chromoprotein. Other genes that appear to mediate resistance in the C-1027 biosynthesis 

gene cluster include a transmembrane transport protein (ORF 27), a Na + /H + transporter (ORF 
0), an ABC transporter (ORF -1, C-terminus), a glycerol phosphate transporter (ORF -2), and 
a UvrA-like protein (ORF -1, N-terminus) (see, e.g., Table 2). 

These ORFs and/or the polypeptides encoded by these ORFs can be utilized 

25 alone, or in combination with one or more other C-1027 ORFs to confer resistance to 

enediyne or enediyne metabolites on a cell. This is useful in a wide variety of contexts. For 
example, to increase production of enediynes. For example, it is believed that C-1027 
resistance could be a limiting factor at the onset of C-1027 production. Provision of an extra 
copy of the plasmid-born sgcB, and overexpression of sgcB under the control of the 

30 constitutive ermE* promoter resulted in increase of C-1027 production (see example 1). 

In a therapeutic context, it is sometimes desirable to confer resistance on 

certain vulnerable cells. Thus, for example, where an enediyne is used as a 
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chemotherapeutic, transfection of vulnerable, but healthy cells (e.g. liver cells remote from 
the tumor site, stem cells, etc.) with vector(s) expressing the resistance gene(s) permits 
administration of the enediyne at a higher dosage with fewer adverse effects to the organism. 
Such approaches have been taken using the multi-drug resistance gene (MDR1) expressing 
5 p-glycoprotein. 

In another embodiment vectors are provided containing one or more 
resistance genes of this invention under control of a constitutive and/or inducible promoter 
thereby providing a "ready-made" expression system suitable for the expression of an 
enediyne or enediyne metabolite at high concentration. 
10 It is also noted that the resistance genes are expected to confer resistance to 

compounds other than enediynes. The resistance genes are expected to confer resistance to 
essentially any cytotoxic compound that can act as a substrate for the resistance gene(s) of 
this invention. 

VI. Kits. 

15 In still another embodiment, this invention provides kits for practice of the 

methods described herein. In one preferred embodiment, the kits comprise one or more 
containers containing nucleic acids encoding one or more of the C-1027 biosynthesis gene 
cluster open reading frames. Certain kits may comprise vectors encoding the sgc gene 
cluster orfs and/or cells containing such vectors. The kits may optionally include any 

20 reagents and/or apparatus to facilitate practice of the methods described herein. Such 

reagents include, but are not limited to buffers, labels, labeled antibodies, bioreactors, cells, 
etc. 

In addition, the kits may include instructional materials containing directions 
(i.e., protocols) for the practice of the methods of this invention. Preferred instructional 

25 materials provide protocols utilizing the kit contents for creating or modifying C-1027 gene 
cluster and/or for synthesizing or modifying a molecule using one or more sgc gene cluster 
ORFs. While the instructional materials typically comprise written or printed materials they 
are not limited to such. Any medium capable of storing such instructions and 
communicating them to an end user is contemplated by this invention. Such media include, 

30 but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), 
optical media (e.g., CD ROM), and the like. Such media may include addresses to internet 
sites that provide such instructional materials. 
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EXAMPLES 

The following examples are offered to illustrate, but not to limit the claimed 

invention. 

Example 1 

5 Genes for production of the enediyne antitumor antibiotic C-1027 in Streptomvces 
globisporus are clustered with the casA gene that encodes the C-1027 apoprotein 

We have been studying the biosynthesis of C-1027 in Streptomyces 
globisporus C-1027 as a model for the enediyne family of antitumor antibiotics (Thorson et 
al (1999) Bioorg. Chem., 27: 172-188). C-1027 consists of a non-peptidic chromophore and 

10 an apoprotein, CagA [also called C-1027 AG (Otani et al (1991) Agrl Biol Chem. 55: 407- 
417)]. The C-1027 chromophore is extremely unstable in the protein-free state, the structure 
of which was initially deduced from an inactive but more stable degradation product 
(Minami et al (1993) Tetrahedron Lett 34: 2633-2636) and subsequently confirmed by 
spectroscopic analysis of the natural product (Yoshida et al (1993) Tetrahedron Lett. 34: 

15 2637-2640) (Fig, 1). While the absolute stereochemistry of the deoxysugar moiety was 
established by total synthesis (Iida et al (1993) Tetrahedron Lett. 34: 4079-4082), the 8S, 
9S, 13S and 17 R configuration of the C-1027 chromophore were based only on computer 
modeling (Okuno et al (1994) J. Med. Chem. 37: 2266-2273). Although no biosynthetic 
study has been carried out specifically on C-1027, the polyketide origin of the enediyne 

20 cores has been implicated by feeding experiments with 13 C-labeled acetate for the 

neocarzinostatin chromophore A (Hensens et al (1989) /. Am. Chem. Soc. Ill: 3295-3299), 
dynemicin (Tokiwa et al (1992) J. Am. Chem. Soc. 1 14: 4107-41 10), and esperamicin (Lam 
et al (1993) J. Am. Chem. Soc. 115: 12340-12345); and deoxysugar biosynthesis has been 
well characterized in actinomycetes (Liu and Thorson (1994) Ann. Rev. Microbiol 48: 223- 

25 256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed W. R. Strohl 
(ed). Marcel Dekker, New York). Given the structural similarity of C-1027 to the other 
enediyne cores and to deoxysugars found in other secondary metabolites, we decided to 
clone either a PKS or a deoxysugar biosynthesis gene as the first step of identifying the C- 
1027 gene cluster from S. globisporus. 

30 Furthermore, the CagA apoprotein of C-1027 has been isolated, its amino acid 

sequence has been determined, and the corresponding cagA gene has been cloned and 

sequenced (Otani et al (1991) Agri. Biol Chem. 55: 407-417; Sakata et al (1992) Biosci. 
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Biotech. Biochem. 56: 1592-1595). Since genes encoding secondary metabolite production 
in actinomycetes have invariably been found to be clustered in one region of the microbial 
chromosome (Hopwood (1997) Chem. Rev. 97: 2465-2497), we further reasoned that 
mapping the cagA gene with either a putative PKS gene, a deoxysugar biosynthesis gene, or 
5 both to the same region of the S. globisporus chromosome should be viewed as strong 

evidence supporting the proposition that the cloned genes constitute the C-1027 biosynthesis 
gene cluster. 

We report here the cloning and sequencing of two genes, sgcA (Streptomyces 
globisporus C-1027) and sgcB, that encode a dNDP-glucose 4,6-dehydratase (NGDH) and a 

10 transmembrane efflux protein, respectively. The sgcA,B locus is indeed clustered with the 
cagA gene, leading to the localization of a 75-kb gene cluster from S. globisporus. The 
involvement of the cloned gene cluster in C-1027 biosynthesis was demonstrated by 
disrupting the sgcA gene to generate C-1027-nonproducing mutants and by complementing 
the sgcA mutants in vivo to restore C-1027 production. Our results, together with similar 

15 effort in the Thorson laboratory on the calicheamicin gene cluster (Thorson et al (1999) 
Bioorg. Chem., 27: 172-188), represent the first cloning of a gene cluster for enediyne 
antitumor antibiotic biosynthesis. 

Materials and methods. 

Bacterial strains and plasmids. 

20 Escherichia coli DH5a was used as a general host for routine subcloning 

(Sambrook et al (1989) Molecular cloning, a laboratory manual Cold Spring Harbor 
Laboratory, Cold Spring Harbor, NY). E. coli XL 1-Blue MR (Stratagene, La Jolla, CA) 
was used as the transduction host for cosmid library construction. E. coli SI 7-1 was used as 
the donor host for E. coli-S. globisporus conjugation (Mazodier et al (1989) J. Bacteriol 

25 171: 3583-3585). Micrococcus luteus ATCC9431 was used as the testing organism to assay 
the antibacterial activity of C-1027 (Hu et al (1988) J. AntibioL 41 : 1575-1579). The 
pGEM-3zf, -5zf, and -7zf and pGEM-T vectors were from Promega (Madison, WI). S. 
globisporus strains and other plasmids in this study are listed in Table 3 

30 Table 3. Strains and plasmids. 

Strain or Relevant Characteristics 
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plasmid 



S. globisporus 

C-1027 Wild-type (Hu et al (1988) J. Antibiot. 41: 1575-1579) 

AF40 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C- 

1027-nonproducing (Mao et al. (1997) Chinese J. Biotechnol 13: 195- 

199) 

AF44 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C- 

1027-nonproducing (Mao et al., supra) 
AF67 Mutant resulted from acriflavine treatment of S. globisporus C-1027, C- 

1027-nonproducing (Mao et al, supra) 
SB 1 00 1 sgcA -disrupted mutant resulted from integration of pBS 1 0 1 2 into S. 

globisporus C-1027 Apr R , C-1027-nonproducing 
SB 1002 sgcA-disruptsd mutant resulted from integration of pBS1013 into S. 

globisporus C-1027 Apr R , C-1027-nonproducing 



Plasmids: 

pOJ446 E. coli-Streptomyces shuttle cosmid, Apr R (Bierman et al (1992) Gene, 1 16: 43- 
pOJ260 E. coli vector, non-replicating in Streptomyces, Apr R (Bierman et al supra) 
pKCl 139 E. coli-Streptomyces shuttle vector, rep TS , Apr R (Bierman et al. supra) 
pWHM3 £. coli-Streptomyces shuttle vector, Th R (Vara et al. (1989) J. BacterioL 111: 
5872-5881) 

pWHM79 ermE* promoter in pGEM-3zf (Shen and Hutchinson (1996) Proa Natl. Acad. 

Set USA 93: 6600-6604) 
pBSlOOl 0.75-kb PCR product amplified from S. globisporus with type I PKS primers in pGEM- 

T 

pBS1002 0.55-kb PCR product amplified from S. globisporus with NGDH gene primers in 
pGEM-T 

pBS1003 0.73-kb PCR product amplified from pBS1005 with cagA primers in pGEM-T 

pBS1004 pOJ446 S. globisporus genomic library cosmid 

pBS1005 pOJ446 S. globisporus genomic library cosmid 

pBS 1 006 pOJ446 S. globisporus genomic library cosmid 

pBS1007 3.0-kb BamHl fragment from pBS1005 in pGEM-3zf, sgcA, sgcB 

pBS1008 4.0-kb BamHl fragment from pBS1005 in pGEM-3zf, cagA 

pBS1009 1.0-kb Kpnl truncated fragment of sgcA from pBS1007 in pGEM-3zf 

pBSlOlO 0.75-kb SaclVSphl internal fragment of sgcA from pBS1009 in pGEM-5zf 
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pBSlOl 1 0.75-kb Sacl/Sphl internal fragment of sgcA from pBSlOlO in pGEM-3zf 
pBS1012 0.75-kb EcoRl/HindlU internal fragment of sgcA from pBSlOlO in pOJ260 
pBS1013 0.75-kb EcoKL/HmdlU internal fragment of sgcA from pBSlOlO in pKCl 139 
pBS 1 0 1 4 2.0-kb EcoRl/Sphl fragment from pBS 1 007 in the Small Sphl sites of pWHM79, ermE*, 
sgcA 

pBS1015 2.5-kb EcoKUHindlll fragment from pBS1014 in pWHM3, ermE* sgcA 
pBS1016 Self-ligation of the 5.2-kb Kpnl fragment from pBS1007 

pBS1017 0.45-kb EcoRl/Sacl fragment from pWHM79 in EcoKUSacl sites of pBS1016, ermE*, 
sgcB 

pBS1018 2.5-kb EcoRI/Hindlll fragment from pBS1017 in pKCl 139, ermE*, sgcB 



Biochemicals and chemicals. 

Ampicillin, apramycin, nalidixic acid, and thiostrepton were from Sigma (St. 
Louis, MO). Unless specified otherwise, restriction enzymes and other molecular biology 
reagents were from standard commercial sources. 

Media and culture conditions. 

E. coli strains carrying plasmids were grown in Luria-Bertani (LB) medium 
and were selected with appropriate antibiotics. S. globisporus strains were grown on ISP-4 
(Difco Laboratories, Detroit, MI) or R2YE at 28°C for sporulation and in TSB (Hopwood et 
al (1985) Genetic manipulation of Streptomyces: a laboratory manual. John Innes 
Foundation, Norwich, UK) supplemented with 5 mM MgCl 2 and 0.5% glycine at 28°C, 250 
rpm for isolation of genomic DNA. For transformation, S. globisporus strains were grown in 
YEME (Hopwood et al., supra.) for preparation of protoplasts and on R2YE for protoplast 
regeneration. For conjugation, both the E. coli SI 7-1 donors and the S. globisporus 
recipients (upon germination in TSB) were prepared in LB, and donors/recipients were 
grown on either ISP-4 medium with 0.05% yeast extract and 0.1% tryptone or AS-1 medium 
(Baltz (1980) Dev. Ind. Microbiol 21: 43-54; Bierman et al (1992) Gene 116: 43-69) at 
30°C for isolation of exconjugants. 

For C-1027 production, S. globisporus strains were grown either on R2YE or 

ISP-4 agar medium at 28°C or in liquid medium by a two-stage fermentation. For liquid 

culture, the seed inoculum was prepared by inoculating 50 mL medium (consisting of 2% 

glycerol, 2% dextrin, 1% fish meal, 0.5% peptone, 0.2% (NH 4 ) 2 S0 4l and 0.2% CaC0 3 , pH 

7.0) with an aliquot of spore suspension, incubating at 28°C, 250 rpm for two days. To a 
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fresh 50 mL of the same medium was then added the seed culture (5%), and incubation 
continued at 28°C, 250 rpm for three to six days (Hu et al (1988) J. Antibiot. 41: 1575- 
1579). The fermentation supernatants were harvested by centrifugation (Eppendorf 5415C, 
4°C, 10 min, 14,000 rpm) on day 3, 4 and 5, and assayed for their antibacterial activity 
5 against M. luteus (Hu et al. (1988)7. Antibiot., 41: 1575-1579). 

DNA isolation and manipulation. 

Plasmid preparation and DNA extraction were carried out by using 
commercial kits (Qiagen, Santa Clarita, CA). Total S. globisporus DNA was isolated 
according to literature protocols (Hop wood et al (1985) Genetic manipulation of 

10 Streptomyces: a laboratory manual. John Innes Foundation, Norwich, UK; Rao et al (1987) 
Methods Enzy mo I. 153: 166-198). Restriction endonuclease digestion and ligation followed 
standard methods (Sambrook et al. (1989) Molecular cloning, a laboratory manual. Cold 
Spring Harbor Laboratory, Cold Spring Harbor, NY). For Southern analysis, digoxigenin 
labeling of DNA probes, hybridization, and detection were performed according to the 

15 protocols provided by the manufacturer (Boehringer Mannheim Biochemicals, Indianapolis, 
IN). 

DNA sequencing. 

Automated DNA sequencing was carried out on an ABI Prism 377 DNA 
Sequencer using the ABI Prism dye terminator cycle sequencing ready reaction kit and 
20 Ampli7a<7 DNA polymerase FS (Perkin-Elmer/ABI, Foster City, CA). Sequencing service 
was provided by either the DBS Automated DNA Sequencing Facility, UC Davis, or Davis 
Sequencing Inc. (Davis, CA). Data were analyzed by ABI Prism Sequencing 2. LI software 
and the Genetics Computer Group program (Madison, WI). 

Polymerase chain reaction (PCR). 

25 Primers were synthesized at the Protein Structure Laboratory, UC Davis. 

PCR was carried out on a Gene Amp PCR System 2400 (Perkin-Elmer/ABI) with Tag 
polymerase and buffer from Promega. A typical PCR mixture consisted of 5 ng of S. 
globisporus genomic or plasmid DNA as template, 25 pmoles of each primers, 25 fiM dNTP, 
5% DMSO, 2 units of Taq polymerase, 1 x buffer, with or without 20% glycerol in a final 

30 volume of 50 \\L. The PCR temperature program was as follows: initial denaturing at 94°C 
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for 5 min, 24-36 cycles of 45 sec at 94°C, 1 min at 60°C, 2 min at 72°C, followed by 
additional 7 min at 72°C. 

^/i^t^^ ForWpe II PKS, the following two pairs of degenerate primers were used — 
5'-AGC TCC ATcVaG TCS ATG RTC GG-3' (forward, SEQ ID NO:103) / 5'-CC GGT 
5 GTT SAC SGC GTMGAA CCA GGC G-3 ' (reverse, SEQ ID NO: 1 04) and 5 '-GAC ACV 
GCN TGY TCB TCvV (forward, SEQ ID NO: 105)/5'-RTG SGC RTT VGT NCC RCT-3' 
(SEQ ID NO: 106) (B, Q+G+T; N, A+C+G+T; R, A+G; S, C+G; V, A+C+G; Y, C+T) 
(reverse) (Seow et al (1^97)7. BacterioL, 179: 7360-7368). No product was amplified 
under all conditions testeA For type I PKS, the following pair of degenerate primers were 

10 used— 5'-GCS TCC CGS OAC CTG GGC TTC GAC TC-3' (forward, SEQ ID NO:107) / 
5'-AG SGA SGA SGA GcAgGC GGT STC SAC-3' (S, G+C) (reverse, SEQ ID NO: 108) 
(Kakavas et al (1997) J. Bacferiol, 179: 7515-7522). A distinctive product with the 
predicted size of 0.75 kb was amplified in the presence of 20% glycerol and cloned into 
pGEM-T according to the protocol provided by the manufacturer (Promega) to yield 

15 pBSlOOl. 

For NGDH, the following pair of degenerate primers were used — 5'-CS GGS 
GSS GCS GGS TTC ATC GG-3' (forward, SEQ ID NO:109) / 5'-GG GWR CTG GYR 
SGG SCC GTA GTT G-3' (R, A+GAS, C+G; W, A+T; Y, C+T) (reverse, SEQ ID NO:l 10) 
(Decker, et al (1996) FEMS Lett., 141: 195-201). A distinctive product with the predicted 

20 size of 0.55 kb was amplified and clondd into pGEM-T to yield pBS1002. 

For cagA, the following pair of primers, flanking its coding region, were 
used _ 5 >_AG GTG GAG GCG CTC ACQ GAG-3 5 (forward, SEQ ID NO: 1 1 l)/5'-G GGC 
GTC AGG CCG TAA GAA G-3' (reverse ,>SEQ ID NO:l 12) (Sakata et al (1992) Bioscl 
Biotechnol Biochem., 56: 159201595). A distinctive product with the predicted size of 0.73 

25 kb was amplified from pBS1005 and cloned irfto pGEM-T to yield pBS1003. 

Genomic library construction and screening, 

S. globisporus genomic DNA was partially digested with Mbol to yield a 
smear around 60 kb, as monitored by electrophoresis on a 0.3% agarose gel. This sample 
was dephosphorylated upon treatment with shrimp alkaline phosphatase and ligated into the 
30 E. coli-Streptomyces shuttle vector pOJ446 (Bierman et al (1992) Gene 116: 43-69) that was 
prepared by digestion with Hpal, shrimp alkaline phosphatase treatment, and additional 
digestion with BarriRl. The resulting ligation mixture was packaged with the Gigapack II 
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XL two-component packaging extract (Stratagene). The package mixture was transduced 
into E. coli XL 1-Blue MR. The transduced cells were spread onto LB plates containing 
apramycin (100 i^g/mL) and incubated at 37°C overnight. The titer of the primary library 
was approximately 6,000 colony-forming units per ng of DNA. Restriction enzyme analysis 
5 of twelve randomly selected cosmids confirmed that the average size of inserts was about 35 
to 45 kb (Rao et al (1987) Meth. Enzymol, 153: 166-198). 

To screen the genomic library, colonies from five LB plates containing 
apramycin (100 ^ig/mL, with approximately 2,000 colonies per plate) were transferred to 
nylon transfer membranes (Micro Separations, Inc., Westborough, MA) and screened by 

10 colony hybridization with the PCR-amplified 0.55-kb NGDH fragment from pBS1002 as a 
probe. The positive cosmid clones were re-screened by PCR with primers for NGDH and 
confirmed by Southern hybridization (Sambrook et al, supra.). Further restriction enzyme 
mapping and chromosomal walking of these overlapping cosmids led to the genetic 
localization of the 75-kb sgc gene cluster, as represented by pBS1004, pBS1005, and 

15 pBS1006 (Fig. 5 A). A 3.0-kb BamUl fragment from pBS1005 that hybridized to the NGDH 
probe was cloned into the same sites of pGEM-3zf to yield pBS1007. Similarly, a 4.0-kb 
BamUl fragment from pBS1005 that hybridizes to the PCR-amplified 0.73-kb cagA probe 
from pBS1003 was cloned into the same sites of pGEM-3zf to yield pBS1008 (Fig. 5B). 

Generation of sscA mutants bv insert-directed homologous recombin ation in S. 

20 globisporus. 

A 1.0-kb Kpnl fragment from pBS1007, containing the C-terminal truncated 
sgcA, was subcloned into pGEM-3zf to yield pBS1009. An internal fragment of sgcA was 
moved sequentially as a 0.75-kb Sacll/Sphl fragment from pBS1009 into the same sites of 
pGEM-5zf to yield pBSlOlO and as a 0.75-kb SaclISphl fragment from pBSlOlO into the 
25 same sites of pGEM-3zf to yield pBSlOl 1. The latter plasmid was digested with EcoRl and 
Hindlll, and the resulting 0.75-kb EcoFI/HindUl fragment was cloned into the same sites of 
pOJ260 and pKCl 139 (Bierman et al (1992) Gene, 1 16: 43-69 to yield pBS1012 and 
pBS1013, respectively. 

Introduction of pBS1012 and pBS1013 into 5. globisporus was carried out by 
30 either polyethyleneglycol (PEG)-mediated protoplast transformation (Hopwood et al (1985) 
Genetic manipulation of Streptomyces : a laboratory manual John Innes Foundation, 
Norwich, UK) or E. coli-S. globisporus conjugation (Bierman et al (1992) Gene 116: 43-69; 
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Matsushima and Baltz (1996) Microbiology 142: 261-267; Matsushima et al (1994) Gene 
146: 39-45), methods for both of which were developed recently in our laboratory. In brief, 
for transformation, pBS1012 and pBS1013 were propagated in E. coli ET12567 (MacNeil et 
al (1992) Gene 111: 61-68), and the resulting double strand plasmid DNA was denatured by 
5 alkaline treatment (Ho and Chater (1997) J, Bacteriol 179: 122-127). The latter DNA (5 
luL) and 200 nL of 25% PEG 1000 in P buffer (Hopwood et al supra) were sequentially 
added to 50 of S. globisporus protoplasts (10 9 ) in P buffer. The resulting suspension was 
mixed immediately and spread on R2YE plates. After incubation at 28°C for 16 to 20 hrs, 
the plates were overlaid with soft R2YE (0.7% agar) containing apramycin (100 i^g/mL, final 

10 concentration); incubation continued until colonies appeared (in 5 to 7 days). For 

conjugation, E, coli S17-l(pBS1012) or E. coli SI 7-1 (pBS1013) was grown to an OD 6 oo of 
0.3 to 0.4. Cells from a 20-mL culture were pelleted by centrifugation, washed in LB, and 
resuspended in 2 mL of LB as the E. coli donors. S. globisporus spores (10 3 to 10 9 ) were 
washed, resuspended in TSB, and incubated at 50°C for 10 min to activate germination. 

15 After additional incubation at 37°C for 2 to 5 hrs, the spores were pelleted and resuspended 
in LB as the 5. globisporus recipients. The donors (100 nL) and recipients (100 \JJ) were 
mixed and spread equally onto two modified ISP-4 or AS-1 plates supplemented freshly with 
10 mM MgCl 2 (see Media and culture conditions). The plates were incubated at 28°C for 16 
to 22 hrs. After removal of most of the E. coli SI 7-1 donors by washing the surface with 

20 sterile water, the plates were overlaid with 3 mL of soft LB (0.7% agar) containing nalidixic 
acid (50 ng/mL, final concentration) and apramycin (100 ^ig/mL, final concentration) and 
incubated at 28°C until exconjugants appeared (in approximately 5 days). 

Unlike pBS1012, which is a Streptomyces non-replicating plasmid, pBS1013 
bears a temperature-sensitive Streptomyces replication origin (Bierman et al (1992) Gene 

25 1 16: 43-69; Muth et al (1989) Mol Gen. Genet. 219: 341-348) that is unable to replicate at 
temperatures above 34°C (Table 3), while the S. globisporus wild-type strain grows normally 
up to 37°C. Thus, spores of 5. globisporus (pBS1013), from either the transformants or the 
exconjugants, were spread onto R2YE plates containing apramycin (100 ng/mL). The plates 
were incubated directly at 37°C, and mutants, resulting from single crossover homologous 

30 recombination between pBS1013 and the S. globisporus chromosome, were readily obtained 
in 7 to 10 days. Alternatively, the plates were first incubated at 28°C for 2 days until 
pinpoint-size colonies became visible and then shifted to 37°C to continue incubation. 



-43- 



Mutants resulting from single crossover homologous recombination grew out of the original 
pinpoint-size colonies as easily distinguishable sectors in 7 to 10 days. 

Construction of the secA and sgcB expression plasmids. 

pBS1007 was digested with EcoRl, and made blunt-ended by treatment with 
5 the Klenow fragment of DNA polymerase I. Upon additional digestion with Sphl, the 

resulting 2.0-kb blunt-ended Sphl fragment containing the intact sgcA gene was cloned into 
the Small Sphl sites of pWHM79 (Shen et al (1996) Proc. Natl. Acad. Sci., USA, 93: 6600- 
6604) to yield pBS1014. The latter was digested with EcoRl and Hindlll, and the resulting 
2.5-kb EcoKL/Hindlll fragment was cloned into the same sites of pWHM3 (Vara et al 

10 (1989) J. Bacteriol 171: 5872-5881) to yield pBS1015, in which the expression of sgcA is 
under the control of the ermE* promoter (Bibb et al (1994) Mol Microbiol 14: 533-545). 

Alternatively, pBS1007 was digested with Kpnl, removing most of the sgcA 
gene, and the 5.2-kb Kpnl fragment was recovered and self-ligated to yield pBS1016. The 
ermE* promoter was subcloned from pWHM79 (Shen et al (1996) Proc. Natl Acad. Sci. t 

15 USA, 93: 6600-6604) as a 0.45-kb EcoRl/Sacl fragment and cloned into the same sites of 
pBS 1 0 1 6 to yield pBS 1 0 1 7. The latter was digested with EcoBl and Hindlll, and the 
resulting 2.5-kb EcoRI/Hindlll fragment was cloned into the same sites of pKCl 139 to yield 
pBS1018, in which the expression of sgcB is under the control of the ermE* promoter. 

Determination of C-1027 production. 

20 The production of C-1027 was detected by assaying its antibacterial activity 

against M. luteus (Hu et al (1988) J. Antibiot. 41: 1575-1579). From liquid culture, 
fermentation supernant (180 nL) was added to stainless steel cylinders placed on LB plates 
pre-seeded with overnight M. luteus culture (0.01% vol/vol). From solid culture, a small 
square block (0.5 x 0.5 x 0.5 cm 3 ) of agar from either R2YE or ISP-4 medium was directly 

25 placed on M. luteus-seedcd LB plates. The plates were incubated at 37°C for 24 hrs, and C- 
1027 production was estimated by measuring the size of inhibition zones. 

Nucleotide sequence accession number. 

The nucleotide sequence reported here has been deposited in the GenBank 
database with the accession number AF201913. 
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Results. 

No polyketide synthase gene was amplified by PCR from S. globisporus. 

On the assumption that the C-1027 enediyne core is of polyketide origin, the 
PCR approach was adopted to screen S. globisporus for any putative PKS genes, although it 
5 is far from certain a priori if the biosynthesis of the enediyne core invokes a PKS and, if so, 
whether the enediyne PKS will exhibit a type I or type II structural organization. PCR 
methods for cloning either type I or type II PKS genes have been developed, and these 
methods have proven to be very effective in cloning PKS genes from various polyketide- 
producing actinomycetes (Kakavas et al. (1997) J. Bacteriol. 179: 7515-7522; Seow et al. 

10 (1997) J. Bacteriol 179: 7360-7368). While no distinctive product was amplified under all 
conditions examined with both pairs of primers designed for type II PKS, a single product 
with the expected size of 0.75 kb was readily amplified by PCR from S. globisporus with 
primers designed for type I PKS, which was subsequently cloned (pBSlOOl). Intriguingly, 
sequence analysis of six randomly selected pBSlOOl clones yielded an identical product — 

15 indicative of a specific PCR amplification — the deduced amino acid sequence of which, 
however, showed no homology to known PKSs (data not shown), excluding the possibility 
of using PKS as a probe to identify the sgc biosynthesis gene cluster. 

Cloning of a putative NGDH gene by PCR from £ globisporus* 

The biosynthesis of various deoxyhexoses share a common key 
20 intermediate — 4-keto-6-deoxyglucose nucleoside diphosphate or its analogs — whose 
formation from glucose nucleoside diphosphate is catalyzed by the NGDH enzyme, an 
NAD + -dependent oxidoreductase (Liu and Thorson (1994) Ann. Rev. Microbiol. 48: 223- 
256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd ed. W. R. Strohl 
(ed). Marcel Dekker, New York). The PCR method was adopted to clone the putative 
25 NGDH gene from S. globisporus with primers designed according to the homologous 

regions of various NGDH enzymes from actinomycetes (Decker et al. (1996) FEMS Lett. 
141: 195-201), resulting in the amplification of a single product with the expected size of 
0.55 kb (pBS1002). Sequence analysis of pBS1002 confirmed its identity as a part of a 
putative NGDH gene. 

30 To clone the complete NGDH gene, an S. globisporus genomic library, 

constructed in the E. coli-Streptomyces shuttle vector pOJ446 (Bierman et al (1992) Gene 
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116: 43-69; Rao et ah (1987) Methods Enzymol 153: 166-198), was analyzed by Southern 
hybridization with the PCR-amplified 0.55-kb fragment from pBS1002 as a probe. Of the 
10,000 colonies screened, 36 positive colonies were identified, 9 of which were confirmed 
by PCR to harbor the DGDH gene. Restriction enzyme mapping showed that all of them 
5 contained a single 3.0-kb BamUl fragment hybridizing to the NGDH probe. Additional 

chromosomal walking from this locus eventually led to the localization of the 75-kb sgc gene 
cluster, covered by 18 overlapping cosmids as represented by pBS1004, pBS1005, and 
pBS1006 (Fig. 5 A). The 3.0-kb BamHI fragment was subcloned (pBS1007) (Fig. 5B), and 
its nucleotide (nt) sequence was determined. 

10 Analysis of the DNA sequences of the sgcA and sscB genes. 

Two complete open reading frames (ORFs) (sgcA and sgcE) were identified 
within the 3.0-kb BamUl fragment of pBS1007, the 3,035-nt sequence of which is shown in 
Figure 6. The sgcA gene most likely begins with an ATG at nt 101, preceded by a probable 
ribosome biding site (RBS), GGAGG, and ends with a TGA stop codon at nt 1099. SgcA 

15 should therefore encode a 332-amino acid protein with a molecular weight of 36,341 and an 
isoelectric point of 6.01. A Gapped-BLAST search showed that the deduced sgcA gene 
product is highly homologous to various putative and known NGDH enzymes from 
antibiotic-producing actinomycetes, including Gdh from the erythromycin biosynthesis gene 
cluster in Saccharopolyspora erythraea (64% identity and 70% similarity) (Linton et ah 

20 (1995) Gene 153: 33-40), MtmE from the mithramycin biosynthesis gene cluster in 
Streptomyces argillaceus (64% identity and 68% similarity) (Lombo et ah (1997) J. 
Bacterioh 179: 3354-3357), and TylA2 from the tylosin biosynthesis gene cluster in 
Streptomyces fradiae (62% identity and 68% similarity) (Merson-Davies and Cundliffe 
(1994) Moh Microbiol. 13: 349-355) (Fig. 7). A conserved sequence of 14 amino acid 

25 residues close to the N-termini can be easily identified in these proteins, which has been 
described as a (3ap fold with an NAD + -binding motif, GxGxxG, (Fig. 7, boxed), consistent 
with their biochemical role in deoxyhexose biosynthesis (Liu and Thorson (1994) Ann. Rev. 
Microbiol. 48: 223-256; Piepersberg (1997) pp. 81-163. In Biotechnology of antibiotics, 2nd 
ed. W. R. Strohl (ed). Marcel Dekker, New York). The function of Gdh and MtmE as TDP- 

30 glucose 4,6-dehydratases, requiring NAD + as a cofactor, has been confirmed by an enzyme 
assay following expression of the gdh (Linton et ah (1995) Gene 153: 33-40) and mtmE gene 
(Lombo et ah (1997) J. Bacterioh 179: 3354-3357) in E. coli, respectively, and by 
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purification of the Gdh protein from Sacc. erythraea (Vara et al (1989) J. Bacteriol 171: 
5872-5881). From these data, it is reasonable to suggest that sgcA encodes the NGDH 
enzyme required for the biosynthesis of the 4,6-dideoxy-4-dimethylamino-5- 
methylrhamnose moiety of the C-1027 chromophore. 
5 Transcribed in the same direction as sgcA, the sgcB gene is located 43 nt 

downstream of sgcA. It should begin with a GTG at nt 1 143, preceded by a probable RBS, 
AGGAG, and end with a TGA at nt 2708 (Fig. 6). Correspondingly, sgcB should therefore 
encode a 521-amino acid protein with a molecular weight of 52,952 and an isoelectric point 
of 4.64. Database comparison of the deduced sgcB product revealed that SgcB is closely 
10 related to a family of membrane efflux pumps, such as LfrA from Mycobacterium smegmatis 
(43% identity and 50% similarity, protein accession number AAC43550) (Takiff et al 

(1996) Proc. Natl Acad. Set USA 93: 362-366), OrfA from Streptomyces cinnamomeus 
(42% identity and 47% similarity, protein accession number AAB71209) (Sommer et al 

(1997) Appl Environ. Microbiol 63: 3553-3560), and RifP from the rifamycin biosynthesis 
15 gene cluster in Amycolatopsis mediterranei (35% identity and 44% similarity, protein 

accession number AAC01725) Augus et al (1998) Chem. Biol 5: 69-79). These proteins are 
membrane-localized transporters involved in the transport of antibiotics (conferring 
resistance), sugars, and other substances. While direct evidence is lacking for RifP 
conferring rifamycin resistance in A. mediterranei by transporting it out of the cells (August 

20 et al (1998) Chem. Biol, 5: 68-79), it has been proven that LfrA employs the 

transmembrane proton gradient in an antiporter mode to drive the efflux of intracellular 
antibiotics, resulting in fluoroquinolone resistance inM smegmatis (Takiff et al (1996) 
Proc. Natl Acad. Sci. USA 93: 362-366). On the basis of the high degree of amino acid 
sequence conservation, an equivalent role could be proposed for SgcB, conferring resistance 

25 by exporting C-1027 from S. globisporus. 

The cazA gene is clustered with the sgcA and s2cB locus. 

To determine if cagA is clustered with the sgcA and sgcB locus, PCR primers 
were designed according to the flanking regions of cagA (Sakata et al (1992) Biosci. 
Biotech. Biochem. 56: 1592-1595). A single product with the predicted size of 0.73 kb was 
30 indeed amplified from several of the overlapping cosmids (which cover the 75-kb sgc 

cluster), including pBS1004 and pBS1005, the identity of which as cagA was confirmed by 
sequencing. Restriction enzyme mapping and Southern hybridization analysis localized 
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cagA to a single 4.0-kb BamHI fragment that is approximately 14 kb upstream of the sgcA.B 
locus (Fig. 5B). The 4.0-kb BamHI fragment was subcloned (pBS1008), and its nt sequence 
was determined, revealing the cagA gene along with two additional ORFs (data not shown) 
(Fig. 5). As reported earlier, cagA encodes a 142-amino acid protein that is processed by 
5 cleavage of a 32-amino acid lead peptide to yield the mature CagA apoprotein (Sakata et al 
(1992) Biosci. Biotech. Biochem. 56: 1592-1595). 

Disruption of the sscA gene in S. globisporus. 

To examine if the cloned sgc cluster encodes C-1027 biosynthesis, sgcA was 
insertionally disrupted by a single crossover homologous recombination event to generate C- 

10 1027-nonproducing mutant strains (Fig. 8A). Two plasmids were used — pBS1012 (a 

pOJ260 derivative) and pBS1013 (a pKCl 139 derivative), each of which contain a 0.75-kb 
internal fragment from sgcA (Table 3). After introduction of pBS1012 into S. globisporus 
either by PEG-mediated protoplast transformation or E. coli-S. globisporus conjugation, 
transformants or exconjugants that were resistant to apramycin were isolated in all cases. 

15 Since pBS1012 is derived from the Streptomyces non-replicating plasmid of pOJ260, these 
isolates must have resulted from integration of pBS1012 into the S. globisporus chromosome 
by homologous recombination. Plasmid pBS1013 was similarly introduced into S. 
globisporus. However, since pBS1013 is derived from pKCl 139 that carries the 
temperature-sensitive Streptomyces replication origin from pSG5 and can replicate normally 

20 at 28°C (Muth et al (1989) Mol Gen. Genet. 219: 341-348), these isolates were subjected to 
incubation at the non-permissive temperature of 37°C to eliminate free plasmids from the 
host cells. As expected, normal growth stopped except for the recombinants that continue to 
grow at 37°C, indicative of integration of pBS1013 into S. globisporus by homologous 
recombination. The apramycin-resistant S. globisporus SB 1001 and S. globisporus SB 1002 

25 strains were chosen as representatives of mutant strains with disrupted sgcA gene resulted 
from integration of pBS1012 and pBS1013, respectively. 

To confirm that targeted sgcA disruption has occurred by a single crossover 
homologous recombination event, Southern analysis of the DNA from the mutant strains was 
performed as exemplified for S. globisporus SB 1001 with either pOJ260 or the 0.75-kb 

30 SacWKpnl internal fragment of sgcA from pBSlOlO as a probe. As shown in Fig. 8B, a 
distinctive band of the predicted size of 6.3 kb was detected with the pOJ260 vector as a 
probe in all mutant strains (lanes 2, 3, and 4); this band was absent from the wild- type strain 
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(lane 1). Complementarily, when using the 0.75-kb Sacll/Kpnl internal fragment of sgcA as 
a probe (Fig. 8C), the 3.0-kb band in the wild-type strain (lane 1) was split into two 
fragments with the size of 6.3 kb and 1.0 kb in the mutant strains (lanes 2, 3, and 4), as 
would be expected for disruption of sgcA by a single crossover homologous recombination 
5 event. 

S. globisporus SB1001 and S. globisporus SB1002 are C-1027-nonproducing 
mutants* 

No apparent difference in growth characteristics and morphologies between 
the wild-type S. globisporus and mutant S. globisporus SB 1001 and S. globisporus SB 1002 

10 strains was observed. While C-1027 production in the wild-type S. globisporus strain could 
be detected on day 3, peaked on day 5, and continued for a few more days, as judged by 
assaying the antibacterial activity of the culture supernant against M luteus (Hu et ah (1988) 
J. Antibiot. 41: 1575-1579), C-1027 production is completely abolished in the sgcA mutant 
strains S. globisporus SB 1001 and S. globisporus SB 1002 (Fig. 9A). The latter phenotype 

15 was identical to that of the AF40, AF44, and AF67 mutants, C-1027-nonproducing S. 

globisporus strains that have been characterized previously (Fig. 9A and 9C) (Mao, et ah 
(1997) Chinese J. Biotechnol 13: 195-199). 

In vivo complementation of S. globisporus SB1001, 

The ability of the wild-type sgcA gene to complement the disrupted sgcA gene 
20 was tested in the S. globisporus SB 1001 strain. The construction of pBS1015, in which the 
expression of sgcA is under the control of the constitutive ermE* promoter, was described in 
Materials and Methods. Both the pBS1015 construct and the pWHM3 vector as a control 
were introduced by transformation into the S. globisporus SB 1001 mutant strains. Culture 
supernants from each transformant were bioassayed against M. luteus for C-1027 production. 
25 pBS1015 restored C-1027 production to S. globisporus SB1001 to the wild-type level; no C- 
1027 production was detected in the control in which pWHM3 was introduced into S. 
globisporus BS1001 (Fig. 9B and 9C). A significant reduction of C-1027 production was 
observed when S. globisporus SB1001(pBS1015) was cultured under identical conditions but 
without thiostrepton (Fig. 9B vs. 6C), indicative that pBS1015 may be unstable in S. 
30 globisporus SB 1001 in the absence of antibiotic selection pressure. 
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Expression of sscB in S. globisporus. 

The effect of sgcB on C-1027 production was tested in the wild-type S. 
globisporus strain. The construction of pBS1018, in which the expression of sgcB is under 
the control of the constitutive ermE* promoter, was described in Materials and Methods. 
5 pBS1018 and the pKCl 139 vector as a control were each introduced by conjugation into S. 
globisporus. Culture supernatants from each exconjugant were harvested on days 3, 4, and 
5, and assayed for C-1027 production by determining the antibacterial activity against M 
luteus. While no apparent difference for C-1027 production was observed between the S. 
globisporus and S. globisporus (pKCl 139) strains, a significant increase in C-1027 
10 production (150±25%) was evident in the early stage of S. globisporus (pBS1018) 

fermentation (Fig. 9D, day 3). However, such effect on C-1027 production leveled off as the 
fermentation proceeded and became insignificant when the culture reached the late stationary 
phase of fermentation (Fig. 9D, day 4 and 5). 

Discussion. 

15 Our inability to clone the putative enediyne PKS gene by PCR, with 

degenerate primers designed according to the highly conserved amino acid sequences of 
either type I or type II PKSs, or by DNA hybridization, with homologous type I or type II 
PKS as probes (data not shown), was unexpected, since feeding experiments by 
incorporation of [1- 13 C]- and [l,2- 13 C]acetate into the enediyne cores of esperamicin (Lam 

20 et al. (1993) J. Am. Chem. Soc. 115: 12340-12345), dynemicin (Tokiwa et al. (1992) J. Am. 
Chem. Soc. 1 14: 4107-41 10), and neocarzinostatin (Hensens et al. (1989) J. Am. Chem. Soc. 
Ill: 3295-3299) supported their polyketide origin. Although the enediyne cores are 
structurally distinct from either the reduced or aromatic polyketides, the biosynthesis of 
which is well characterized by type I or type II PKS, respectively, it could be imagined that 

25 an enediyne PKS catalyzes the biosynthesis of a polyunsaturated linear heptaketide 

intermediate that is subsequently cyclized into the enediyne core structure (Hu et al. (1994) 
Mol Microbiol. 14: 163-172; Spaink et al. (1991) Nature 354: 125-130; Thorson et al 
(1999) Bioorg. Chem., 27: 172-188). Alternatively, Hensens and co-workers proposed a 
fatty acid origin for the enediyne core that was also consistent with the isotope labeling 

30 results. These authors suggested oleate as a precursor that is shortened by loss of carbons 

from both ends and is desaturated via the oleate-crepenynate pathway to furnish the enediyne 

core (Hensens et al. (1989)/. Am. Chem. Soc. Ill: 3295-3299). The latter pathway 
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resembles polyacetylene biosynthesis in higher plants and fungi and requires an acetylene 
forming enzyme — a plant gene encoding such an enzyme was identified recently (Lee et al 
(1998) Science 280: 915-918). Our DNA sequence analysis of approximately 60 kb of the 
sgc gene cluster, fails to reveal any gene that resembles PKS. 
5 Although little is known about the resistance mechanism for the enediyne 

antibiotics in general, the apoproteins of the chromoprotein type of enediynes could be 
viewed as resistance elements that confer self-resistance to the producing organisms by drug 
sequestration (Thorson et al (1999) Bioorg. Chern., 27: 172-188). Such a resistance 
mechanism is in fact well established in antibiotic-producing actinomycetes, for example, 

10 BlmA, the bleomycin-binding protein from Streptomyces verticillus (Shen et al (1999) 

Bioorg. Chem. 27: 155-171). Given the fact that antibiotic production genes have invariably 
been found to be clustered in one region of the microbial chromosome, consisting of 
structural, resistance, and regulatory genes, we adopted a strategy to clone the sgc gene 
cluster by mapping a putative C-1027 structural gene to the previously cloned cagA gene, 

15 considered as a resistance gene that encodes the C-1027 apoprotein. 

We chose NGDH as the putative C-1027 structural gene on the basis of the 
4,6-dideoxy-4-dimethylamino-5-methylrhamnose moiety of the C-1027 chromophore. It has 
been well established that all deoxyhexoses could be derived from the common intermediate 
of 4-keto-6-deoxyglucose nucleoside diphosphate, the biosynthesis of which from glucose 

20 nucleoside diphosphate is catalyzed by an NGDH enzyme. We cloned the NGDH gene from 
S. globisporus by PCR and used it as a probe to screen an S. globisporus genomic library, 
resulting in the isolation of the 75-kb sgc gene cluster. DNA sequence analysis of a 3.0-kb 
BamUl fragment of the sgc cluster confirmed the presence of the NGDH protein, encoded by 
sgcA, along with sgcB that encodes a transmembrane efflux protein (Fig. 6). The cagA gene 

25 indeed resides approximately 14 kb upstream of sgcA (Fig. 5); DNA sequence analysis of a 
4.0-kb BamUl fragment confirmed the identity of cagA along with two additional ORFs 
(data not shown). These results underline once again the effectiveness of cloning natural 
product biosynthesis gene clusters by exploiting the clustering phenomenon between 
resistance and structural genes. 

30 The involvement of the cloned gene cluster in C-1027 biosynthesis was 

demonstrated by disrupting the sgcA gene to generate S. globisporus mutants, the ability of 

which to produce C-1027 was completely abolished (Fig. 9 A), and by complementing the 

sgcA mutants in vivo upon expression of sgcA in trans to restore C-1027 production (Fig. 9B 
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and 6C). These data unambiguously establish that sgcA is essential for C-1027 production, 
and thus support the conclusion that the cloned gene cluster encodes C-1027 biosynthesis. It 
should be pointed out that, although the sgcA mutants S, globisporus SB 1001 and S. 
globisporus SB 1002 were characterized as C-1027-nonproducing on the basis of the 
5 antibacterial assay alone (Fig. 9 A), this phenotype was identical to that of the controls of the 
AF40, AF44, and AF67 mutants (Fig. 9A and 9C). The latter strains were isolated 
previously upon randomly mutagenizing the wild-type S. globisporus strain with acriflavine 
and confirmed to be C-1027-nonproducing by both the antibacterial bioassay and an 
antitumor spermatogonial assay (Mao, et ah (1997) Chinese J. Biotechnoh 13: 195-199), 

10 providing strong support to the current study. Gene disruption and complementation in S. 
globisporus were made possible by the recently developed genetic system that allowed us to 
introduce plasmid DNA into S. globisporus via either PEG-mediated protoplast 
transformation (Hopwood et ah (1985) Genetic manipulation ofStreptomyces: a laboratory 
manual. John Innes Foundation, Norwich, UK) or E. coli-S. globisporus conjugation 

15 (Bierman et ah (1992) Gene 1 16: 43-69; Matsushima and Baltz (1996) Microbiology 142: 

261-267; Matsushima et ah (1994) Gene 146: 39-45) for analyzing the sgc biosynthesis gene 
cluster in vivo. Given the difficulties encountered with calicheamicin biosynthesis in 
Micromonospora echinospora, into which all attempts to introduce plasmid DNA have failed 
(Thorson et ah (1999) Bioorg, Chem., 27: 172-188), the latter results underscore the 

20 importance of selecting C-1027 as a model system for enediyne biosynthesis so that many of 
the genetic tools developed in Streptomyces species can now be directly applied to the study 
of enediyne biosynthesis. 

Finally, the function of sgcB was probed by examining C-1027 production, 
following expression of the gene in the wild-type S. globisporus strain. Database 

25 comparison of the deduced amino acid sequence clearly suggested SgcB as a transmembrane 
efflux protein, conferring resistance by exporting C-1027 out of the cell. Hence, in addition 
to CagA, SgcB could be viewed as the second resistance element identified for C-1027 
biosynthesis. Multiple resistance genes have been identified in numerous antibiotic 
biosynthesis gene clusters (Hopwood (1997) Chem. Rev. 97: 2465-2497). It could be 

30 imagined that CagA and SgcB function cooperatively to provide resistance — the C-1027 

chromophore is first sequestered by binding to the preaproprotein CagA to form a complex, 

which is then transported out of the cell by the efflux pump SgcB and processed by removing 

the leader peptide to yield the chromoprotein, although we do not have any experimental 
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data to substantiate this speculation. Since it is known that yields for antibiotic production 
could be profoundly altered by the introduction of extra copies of regulatory, resistance, or 
structural genes into wild-type organisms (Hutchinson (1994) Bio/Technology 12: 375-380), 
we tested the effect of overexpressing sgcB in S. globisporus on C-1027 production. While 
5 no apparent adverse effect on C-1027 production was observed upon introduction of the 
pKCl 139 vector into S. globisporus (data not shown), a significant increase in C-1027 
production (150±25%) was observed in the early stage of S. globisporus (pBS1017) 
fermentation (Fig. 9D, day 3), supporting the predicted function for SgcB in C-1027 
biosynthesis. We propose that C-1027 resistance could be a limiting factor at the onset of C- 

10 1027 production, which is circumvented by the extra copy of the plasmid-born sgcB, and 
overexpression of sgcB under the control of the constitutive ermE* promoter results in 
increase of C-1027 production. However, as the S. globisporus (pBS1017) fermentation 
proceeds to its stationary phase, C-1027 resistance is no longer a limiting factor for overall 
C-1027 production, and the effect of extra copy of SgcB on C-1027 production consequently 

15 became insignificant (Fig. 9D, day 5). 

In conclusion, genetic analysis of enediyne biosynthesis has heretofore met 
with little success in spite of considerable effort (Thorson et al (1999) Bioorg. Chem., 27: 
172-188). The localization of the sgc gene cluster and characterization of the sgcA and sgcB 
genes have now provided an excellent basis for genetic and biochemical investigations 

20 and/or modification of C-1027 biosynthesis, and gene disruption and overexpression in S. 
globisporus clearly demonstrated the potential to construct enediyne-overproducing strains 
and to produce novel enediynes that may have enhanced potency as novel anticancer drugs 
using combinatorial biosynthesis and targeted mutagenesis. We envisage that the results 
from C-1027 biosynthesis should facilitate the cloning and characterization of biosynthesis 

25 gene clusters of other enediyne antibiotics in Streptomyces as well as in other actinomycetes, 
and could have a great impact on the overall field of combinatorial biosynthesis. 
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It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
5 this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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